Shyam P.

Proposal to Combine findTrainingData and Label Phases into a Unified Iterative Workflow for Efficient Training Data C...

Conceptual Question: While attempting to match, I followed the phases: findTrainingData and label. However, when I proceeded to train, I encountered the following message:

24/12/06 12:16:31 WARN Client: Apologies for this message. Zingg has encountered an error. Unable to train as insufficient training data found. Training data has 14 matches and 1 non-match. Please run findTrainingData and label until you have sufficient labeled data to build the models.

Wouldn’t it be more efficient as a workflow to combine findTrainingData and label into a single integrated phase, iteratively identifying and labeling cases until the model gathers sufficient training data for downstream activities? Streamlining the workflow by combining related phases (findTrainingData and label) into a more cohesive and iterative process. This would make it easier for users to collect sufficient training data without manually switching between phases, which can feel disjointed or repetitive.

1Comment

Posted in General·

Shyam P.

Is the Zingg 0.4.1 release on the cards?

1Comment

Commented on https://global2024.pydata.org/cfp/talk/XMU9X9/·Posted inGeneral

Shyam P.

Sonal G. was I able to answer your question?

Commented on https://global2024.pydata.org/cfp/talk/XMU9X9/·Posted inGeneral

Shyam P.

I am yet to attend the talk. But after quick investigation, this is what I understood! Attached is a high level quick component flow diagram, explaining overall steps. I am not sure if I am understanding your question correctly or not. Aliases and entity linking appears to be manually curation. For example:

{"entity_id":"a1","name":"Machine learning","description":"Machine learning (ML) is the scientific study of algorithms and statistical models..."}
 {"entity_id":"a2","name":"Meta Language","description":"ML (\"Meta Language\") is a general-purpose functional programming language. It has roots in Lisp, and has been characterized as \"Lisp with types\"."}

The entities a1 and a2 share a common alias called ML Disambiguation is done at Linking Process layer using the context.

Posted in General·

Shyam P.

https://global2024.pydata.org/cfp/talk/XMU9X9/

4Comments