Hi all, I have a question about the labelling file.
Regarding the no match entries, is it better to group them into similar pairs that donβt match, for example:
z_cluster,z_isMatch,firstname,lastname,jobtitle,..
123,0,James,Sallivan,IT Specialist,..
123,0,Anthony,Sallivan,Data Engineer,..
456,0,Frank,Williams,Project Manager,..
456,0,Franco,William,Sr Project Manager,..
or is it also fine to have larger no match groups within the same z_cluster, since none of them match anyway, for example:
z_cluster,z_isMatch,firstname,lastname,jobtitle,..
123,0,James,Sallivan,IT Specialist,..
123,0,Anthony,Sallivan,Data Engineer,..
123,0,Frank,Williams,Project Manager,..
123,0,Franco,William,Sr Project Manager,..
Which of the two options would make the training more effective? Intuitively, I would go for the first option, but Iβm wondering just in case... Thanks.