I have been playing with the example notebook and have tried using our company data as well as using the test.csv from the repo, and am not able to get any matches. Even when I specify the Name has to be EXACT for our company data. I have confirmed that I see some records in the company dataset where the name is an exact match. I have run through about 40 iterations and am not really getting any closer to a solution. Any thoughts/ideas on how I can move forward?
Thanks,
Sandy
Thanks Sonal. I will try that again, and report back. I did try many different values and didn't seem to get a better response, but will definitely try again and let you know.
Sure, please keep me posted Sandy. Also, when you say 40 iterations, is it number of records or number of times you ran findTrainingData? Mostly you will start seeing some matches post the 3rd or 4th round, so hang in there if thats the case.
I have run at least 10 iterations with the same setup and have only seen 1 match. I must have something setup wrong because I am asking to have an exact match on name, and it is still suggesting things that don't match.
Here is an example of what I see:
Oh. It seems these are not getting saved then and hence no learning has happened. There is an explicit cell to save the labels. What happens when you label and run that?
I think I found my problem. I have been going to the cell right after we setup the performance parameters and have selected "run all below". I just manually ran each cell, independently and the n_pos and n_neg are set correctly. I think by running the cell after the labeling and getting an error somehow interrupted the process once I actually labeled the data and ran the save again.
Happy to see this sorted Sandy C. and thanks for your continued effort on working with Zingg. ππ if you feel the instructions can be improved, please do consider sending a pull request