How Does Zingg 0.3.4 Handle Null Values in Matching PII Contacts?
Hi Sonal G., our team is currently assessing results from some tests conducted with Zingg 0.3.4 on a subset of PII contacts. They have some questions about how the model handles null values. It seems that nulls may be ignored, with the model relying on populated columns and adjusting their importance to identify matches. If this is indeed the case, it could pose a risk; for instance, two contacts with the same first and last names but null values for other fields, like email, should not be matched with high confidence. We want certain columns to carry more weight in match evaluation, and I assume much of this depends on the labelling sets and iterative training phase. So, returning to the initial question, how does the model handle NULL values? Is there a rebalancing based on populated columns? Would it be sufficient to iterate on the training phase with more labels to refine the column importance scores for our purposes? Thanks.