Alex B.

Commented on Migrating from AWS Glue Record Matching to Zingg:...·Posted inHelp Zingg

this is how the metrics look for the glue job that we ran to train the model with 70k labels, it's close to 0% CPU usage all the time

Commented on Migrating from AWS Glue Record Matching to Zingg:...·Posted inHelp Zingg

Alex B.

thank you, scheduled some time!

Commented on Migrating from AWS Glue Record Matching to Zingg:...·Posted inHelp Zingg

Alex B.

Thank you, we tried tuning the labelDataSampleSize but it didn't help. We are interested in the feature of enforcing matches that you mentioned is available in the enterprise version. When could we meet to discuss our options?

Commented on Migrating from AWS Glue Record Matching to Zingg:...·Posted inHelp Zingg

Alex B.

thank you! we are going to try

Commented on Migrating from AWS Glue Record Matching to Zingg:...·Posted inHelp Zingg

Alex B.

Thanks! We tried to use the 10M labels in Glue with 64 G.8X workers but it failed after 10min:

An error occurred while calling o166.execute. Job aborted due to stage failure: ResultStage 51 (collectAsList at BlockingTreeUtil.java:52) has failed the maximum allowable number of times: 4

We tried with a much smaller set of labels, around 70k, using 32 G.4X workers and it was still running after 11h so we stopped it. Our label dataset contains multiple labels per cluster. Would that be an issue? Should we have our labels organized in pairs?

Posted in Help Zingg·

Alex B.

Migrating from AWS Glue Record Matching to Zingg: Handling Positive/Negative Labels and Large-Scale Training Feasibility

Hey, we are trying to migrate from AWS Glue Record Matching to Zingg. When we were using Glue we trained a model with a small set of labels but then we negative enforcement labels when running FindMatches to make sure some records were never clustered together. Is it possible to do something like this in Zingg? Is it possible to send positive/negative enforcement labels when finding matches/clusters? We have more than 10M labels to ensure the right items are clustered together or separated depending on the case. Would be feasible to train a model with that amount of labels? Thank you!

11Comments

Commented on Welcome @Spencer B. @Mark W. @Alex B. to Zingg!·Posted inIntroduce Yourself

Alex B.

Thanks! We are exploring Zingg to replace AWS Record Matching feature