Issues Accessing Zingg Model File from S3 in AWS Glue Notebook for Full Dataset Matching
I have been trying out zingg for last few days. I was able to install zingg 0.4.0 in my local machine and used that to label and train the model using a sample of my data. Currently I am trying to use this model to run the match on my full dataset using glue notebook. My data is in s3 and I have placed my model and config file also in S3. But then the glue notebook was having some issue in reading the config and model from s3. I tried placing both in /tmp folder of glue notebook. Looks like it is able to read the config but not the model file. Some searches were suggesting that the model should be in S3 as the glue workers will not be able to access /tmp. But when I place the model in s3, zingg is failing saying it is not able to read this. I tried to use both zingg-0.4.0.jar and zingg-0.3.4-SNAPSHOT.jar but both failed with similar error. Could you please look in to this and let us know how to proceed with this?