Hi all, I’m currently working with Zingg version 3.3 and have successfully created a new model using my own dataset. I encountered a scenario where two records appear to represent the same individual: Data 1: Name: C.V. RAVISEKAR VASUDEVAN PAN: APPER5089R Mobile: 9876543210 Data 2: Name: RAVISEKAR CV PAN: APPER5089R Mobile: 9876543210 Despite the strong similarities—especially in PAN and mobile number—the model has classified them as unique records instead of a match. I defined the config in the same order (Name, PAN, Mobile) using the fuzzy match type. Would Zingg work in this manner, or am I missing something?
Hi! Zingg would very much in this scenario, please make sure your training covers some variations like these.
Yah Sonal G. While labeling, these types of scenarios were not captured from the given input, which prevented us from labeling them. We are now attempting to update the training sample output by manually appending these missing cases. Is this the correct approach, or are there alternative options we should consider?
yes, just add 4-5 samples of these types through trainingSamples and you should be good to go. May need a bit of tweaking
Let us know how it goes 🙂
Hi Sonal G., We have a total data count of approximately 8,000 records. Out of these, we labelled 140 records: 54 were matched, 80 were unmatched, and 6 were marked as not sure.However, during the export model phase, we encountered an following error: ValueError: Some of types cannot be determined after inferring org.apache.spark.SparkUserAppException: User application exited with 1 at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:103) at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala) at zingg.PeekModel.execute(PeekModel.java:42) at zingg.client.Client.execute(Client.java:239) at zingg.client.Client.main(Client.java:180) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 2025-04-10 17:25:33,066 [main] WARN zingg.client.util.Email - Unable to send email Illegal address 2025-04-10 17:25:33,067 [main] WARN zingg.client.Client - Apologies for this message. Zingg has encountered an error. User application exited with 1 zingg.client.ZinggClientException: User application exited with 1 at zingg.PeekModel.execute(PeekModel.java:47) at zingg.client.Client.execute(Client.java:239) at zingg.client.Client.main(Client.java:180) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Could this be due to an insufficient number of labelled records for the given input size? Is there any possibility or recommended approach to handle this scenario?
hmm..how are oyu running this? Which release? What command? Are the trainingSamples in line with the schema you have specified? If you need help, consider signing up for the office hours
After completing the training and matching phases, we proceeded with the exportModel phase to include additional scenarios in training sample. We are using Zingg version 3.4, and the command we used is: ./scripts/zingg.sh --phase exportModel --conf Config.json --location models/100 Yes, the training samples are aligned with the schema defined in the configuration.
i think export model was broken in that release
However, Zingg 3.4 seems to be working fine. For example, when using the exportModel phase on another model with approximately 100,000 records, we labeled 420 records — out of which 56 were matched. We are facing issues with this specific case: We have a total data count of approximately 8,000 records. Out of these, we labeled 140 records — 54 matched, 80 unmatched, and 6 marked as "not sure." I have a doubt — could this issue be related to an insufficient number of labeled records for the given input size? If the exportmodel phase is broken in the current release, would it be advisable to try the 4.0 release to resolve this issue?
I tried running the exportmodel phase using the Zingg 4.0 release, but I'm encountering an error that results in a NullPointerException.
export model is something we have fixed in 0.5.0. we should have a release next week, in the interim you can use pyspark if you are familiar with it to read the model parquet based data. hope that helps