Understanding Unexpected Behavior in Exact Match String Field Due to Training Data Labeling
Hey, could someone please verify whether my understanding of the drivers for slightly unexpected behaviour I observed is correct? The situation: I have a string field (my_field) where an exact match is required and I set the matching strategy to exact in the config. The output contains a number of z clusters where my_field is the only non-null field for all records but the values in my_field are different for all records. They're similar (e.g. my_field = 'value1' and my_field = 'value2') but for this field, similarity is insignificant unless they're identical. My assumption: Examples in the training data must have caused this. Most likely, it contains a pair/pairs that were labelled as a match (for other reasons than the my_field values) that had similar my_field values. Does this sound right or am I overlooking something here?