The end goal is to only match some 300-400 unmatched 'new' entities (and primarily only 1-2 columns of data like Name + State) to a 'mastered' list of 12.5k. But there are nearly 400k of variations that already match to 12.5k entities. Can you guide me on how to structure this data and process for zingg to tackle this use case?