Hi All, I am going though the examples of training and labeling, but that has only specific fields. I want to train my own data with a different schema like having extra fields of email or phone. When I try to add those fields I am getting fields not found errors in spark. Is there any example do we have to train and label data when the schema varies? Also I see model id 100, 101..etc which I am placing on config file but I dont see any documentation which explain the difference between those.. Appreciate any help. Thanks
Mickey https://github.com/zinggAI/zingg/tree/main/examples We have examples for varied schemas in this folder depending on different datasets - you can go through these and let me know in case of any queries/concerns
Models with id 100, 101, etc. are pre-trained models that we have for the above examples Each config has a different model id for example - the febrl data set has modelId 100, febrl120k dataset has modelId 101 and so on
But I went through the examples but I dont see a single example using phone and email addresses. Do I need to point to any specific model for that? An example config would be great to have!
you can refer to this for field definitions on phone and email id: https://docs.zingg.ai/latest/stepbystep/configuration/field-definitions
I actually tried that earllier, so raised that issue
It gives spark sql analysis exception: email_address field can't be resolved
Thanks. Thats worked. I could run trainMatch easily on my local with 200 records but when I tried with 20k it simply hanged. May be I need to try this on aws and see.
20k should be doable locally, even half a million. whats the setup you have?
I just used the default settings on zingg.conf and running via docker on mac m1
Thatβs cool, glad to hear that!