Hi team, I'm using the docker 0.4.0 image with BigQuery. So far I was able to label and train a model.
I'm facing 2 problem that I cannot seem to solve (Also looked through GH issues and gitbook docs)
- 1.
When running ./scripts/zingg.sh --conf /zingg-0.4.0/config/properties.json --phase exportModel --location /tmp/my_model_export I get a null-pointer error
(null_pointer.txt)
How can I fix this?
2. After I matched the first batch of data, how do I add more data while maintaining the same z_cluster for each entitity. So existing records should keep their assigned z_cluster and new matching records should be assigned the same existing z_cluster . If I have new entries daily and want to run it through matching on a daily basis, I'd like to stitch new records to old z_cluster if they match.
I tried passing a union of matched data and new data as input and added z_cluster and z_isMatch as additional columns to the matched records (and setting it to null for new records):
"fieldDefinition":[
{
"fieldName": "z_cluster",
"dataType": "string",
"fields": "z_cluster",
"matchType": "DONT_USE"
},
{
"fieldName": "z_isMatch",
"dataType": "integer",
"fields": "z_isMatch",
"matchType": "DONT_USE"
},
{
"fieldName": "first_user_data_point",
"dataType": "string",
"fields": "first_user_data_point",
"matchType": "EXACT,NULL_OR_BLANK"
},
{
...
}
]
However, when running this, I get [AMBIGUOUS_REFERENCE] Reference `z_cluster` is ambiguous, could be: [`z_cluster`, `z_cluster`].
(ambiguous_z_cluster.txt)
Thank you for your help!