Nathan E.

Commented on Estimating Zingg Model Runtime and Cost for Large...·Posted inHelp Zingg

Sonal G. exactly what would be required in order to estimate this?

Estimating Zingg Model Runtime and Cost for Large Datasets with 5-15 Fields and Join Conditions

Hi. I have a client looking to use Zingg. They have two datasets initially (with 2 more coming online soon) which have ~450 million records and ~60 million records, respectively. They wish to model based on 5-15 fields but this will be dependent on their join condition. They're looking for a rough estimate of time, and therefore cost to run the model. Can anyone make a suggestion? The current hardware sizing documentation states:

80m records with 8-10 fields took less than 2 hours on 1 driver (128 GB RAM, 32 cores), 8 workers (224 GB RAM, 64 cores). This is a user-reported stat without any optimization.

So we can (roughly) extrapolate, I am going to guess ~2-5 hours but honestly I cannot be certain. Any guidance here would be deeply appreciated.

2Comments

Posted in Help Zingg·

Nathan E.

Questions on Setting Up and Managing Zingg Models with Multiple Databases and Retraining Frequency

Hi all, I'm looking for some answers to the following questions:

1.
Let's say I have 3 databases each coming from a different business unit. How exactly do I need to set up the data for Zingg to work? Is it a case of providing two long format datasets which look the same (e.g. have columns id, email, dob, etc.) or do you provide a singular unified dataset?
2.
I assume that if the model is trained using a given set of columns, e.g. id, email, dob then if I had a third dataset, that dataset would require those same columns and I cannot just use a subset, e.g. email and dob only?
3.
Using the enterprise solution, I can generate a unique ID which I can use across my 3 databases. I assume that once I have these, I will need to do a merge into to join these unique IDs back onto my data using the attributes I fed to the model, e.g. id, email, dob?
4.
How often should I be retraining the model? Is it typically a one and done thing? Or do the enterprise model evaluation tools enable me to track model degradation over time? And then, assuming the model has degraded, retrain and do a similar upsert?

6Comments