Questions on Setting Up and Managing Zingg Models with Multiple Databases and Retraining Frequency
Hi all, I'm looking for some answers to the following questions:
- 1.
Let's say I have 3 databases each coming from a different business unit. How exactly do I need to set up the data for Zingg to work? Is it a case of providing two long format datasets which look the same (e.g. have columns id, email, dob, etc.) or do you provide a singular unified dataset?
- 2.
I assume that if the model is trained using a given set of columns, e.g. id, email, dob then if I had a third dataset, that dataset would require those same columns and I cannot just use a subset, e.g. email and dob only?
- 3.
Using the enterprise solution, I can generate a unique ID which I can use across my 3 databases. I assume that once I have these, I will need to do a merge into to join these unique IDs back onto my data using the attributes I fed to the model, e.g. id, email, dob?
- 4.
How often should I be retraining the model? Is it typically a one and done thing? Or do the enterprise model evaluation tools enable me to track model degradation over time? And then, assuming the model has degraded, retrain and do a similar upsert?