Comparison of Zigg and AWS Managed Entity Resolution: Pros and Cons Guide

Zingg Community

12 comments

· Sorted by Oldest

Sania G.
·
Here are some details about Zingg. Hope you find them useful ◦Training Data: Zingg uses a findTrainingData phase to search for edge cases in the data, which can then be labeled by the user to train the model. This approach minimizes user effort and allows models to be built and deployed quickly. ◦ User Feedback: The cycle of findTrainingData and labeling is designed to be iterative, allowing Zingg models to continuously improve through user feedback. This iterative approach enables the models to become smarter and more accurate over time ◦ Zingg works on AWS Glue: https://aws.amazon.com/blogs/big-data/entity-resolution-and-fuzzy-matches-in-aws-glue-using-the-zingg-open-source-library/ ◦ Data Sources and Sinks: Zingg supports various data sources and sinks, including Databricks, Snowflake, JDBC (Postgres, MySQL), AWS S3, Cassandra, MongoDB, Neo4j, Parquet, BigQuery, and Exasol ◦ Cloud Support: Zingg can be run on cloud platforms like AWS, Azure, Databricks, and Fabric ◦ Scalability: Zingg can process large sample sizes and by judiciously selecting training data, Zingg can achieve good performance even with large overall sample sizes, as it focuses on the most informative examples
❤️1
Mickey
·
I am still comparing but I do see better result with zingg as I can train the models the way I want.
1
Sonal G.
·
Would love to see a quote on this ☺️ 😉
Mickey
·
Sure. Let me do some more testing. I want to also see the real time aspect of it. I see zingg.ai has mentioned real time idres. Is it a paid service?
Sonal G.
·
Zingg is a batch based system so far Mickey
Mickey
·
Thanks! On zingg.ai. it is mentioned as below: "Continuously Updated Identity Graph Right Within Your Datalake And Warehouse" So I thought it is already doing real time update
Sonal G.
·
I see. You can actually run periodic jobs either altered and new data and keep updating the index
👍1
Sonal G.
·
That’s what we call incremental flow
Mickey
·
Got it. Thanks! For incremental load, what happens to a record which belongs to an existing historical group id? Does it assign the old clusterid? How to make sure it uses the consistent for similarity but different clusterid for new, on every run?
Akshara S.
·
check out the following links: 🔗 Zingg Incremental Flow - Learning from Data 🔗 Zingg Incremental Flow - Product Features
Mickey
·
Thanks! I went through the link above. What I understand that, incremental logic is only available on paid version with zingg id. For opensource, we need to run the full load again where clusterid may change after regrouping. Is it the correct understanding?
Sonal G.
·
Yes you are correct
👍1

Comparison of Zigg and AWS Managed Entity Resolution: Pros and Cons Guide | Zingg Community

Sania G.
·
Here are some details about Zingg. Hope you find them useful ◦Training Data: Zingg uses a findTrainingData phase to search for edge cases in the data, which can then be labeled by the user to train the model. This approach minimizes user effort and allows models to be built and deployed quickly. ◦ User Feedback: The cycle of findTrainingData and labeling is designed to be iterative, allowing Zingg models to continuously improve through user feedback. This iterative approach enables the models to become smarter and more accurate over time ◦ Zingg works on AWS Glue: https://aws.amazon.com/blogs/big-data/entity-resolution-and-fuzzy-matches-in-aws-glue-using-the-zingg-open-source-library/ ◦ Data Sources and Sinks: Zingg supports various data sources and sinks, including Databricks, Snowflake, JDBC (Postgres, MySQL), AWS S3, Cassandra, MongoDB, Neo4j, Parquet, BigQuery, and Exasol ◦ Cloud Support: Zingg can be run on cloud platforms like AWS, Azure, Databricks, and Fabric ◦ Scalability: Zingg can process large sample sizes and by judiciously selecting training data, Zingg can achieve good performance even with large overall sample sizes, as it focuses on the most informative examples
❤️1
Mickey
·
I am still comparing but I do see better result with zingg as I can train the models the way I want.
1
Sonal G.
·
Would love to see a quote on this ☺️ 😉
Mickey
·
Sure. Let me do some more testing. I want to also see the real time aspect of it. I see zingg.ai has mentioned real time idres. Is it a paid service?
Sonal G.
·
Zingg is a batch based system so far Mickey
Mickey
·
Thanks! On zingg.ai. it is mentioned as below: "Continuously Updated Identity Graph Right Within Your Datalake And Warehouse" So I thought it is already doing real time update
Sonal G.
·
I see. You can actually run periodic jobs either altered and new data and keep updating the index
👍1
Sonal G.
·
That’s what we call incremental flow
Mickey
·
Got it. Thanks! For incremental load, what happens to a record which belongs to an existing historical group id? Does it assign the old clusterid? How to make sure it uses the consistent for similarity but different clusterid for new, on every run?
Akshara S.
·
check out the following links: 🔗 Zingg Incremental Flow - Learning from Data 🔗 Zingg Incremental Flow - Product Features
Mickey
·
Thanks! I went through the link above. What I understand that, incremental logic is only available on paid version with zingg id. For opensource, we need to run the full load again where clusterid may change after regrouping. Is it the correct understanding?
Sonal G.
·
Yes you are correct
👍1