Logs when i run the trainMatch phase
26/06/22 04:56:12 INFO Client:
26/06/22 04:56:12 INFO Trainer: Reading inputs for training phase ...
26/06/22 04:56:12 INFO Trainer: Initializing learning similarity rules
26/06/22 04:56:12 WARN PipeUtilReader: Reading Pipe [name=null, format=parquet, preprocessors=null, props={path=/tmp/zingg_dir/app_models/100/trainingData//marked/}]
26/06/22 04:56:14 WARN DSUtil: Read marked training samples
26/06/22 04:56:14 WARN DSUtil: No configured training samples
26/06/22 04:56:15 WARN BlockManager: Block rdd_12_0 already exists on this machine; not re-adding it
26/06/22 04:56:15 WARN Trainer: Training on positive pairs - 23
26/06/22 04:56:15 WARN Trainer: Training on negative pairs - 58
26/06/22 04:56:15 WARN PipeUtilReader: Reading Pipe [name=app, format=csv, preprocessors=null, props={path=/tmp/zingg_dir/nobids_apps.csv, header=false, delimiter=,}]
26/06/22 04:56:19 INFO Heuristics: **Block size **100 and total count was 279411
26/06/22 04:56:19 INFO Heuristics: Heuristics suggest 100
26/06/22 04:56:19 INFO BlockingTreeUtil: Learning indexing rules for block size 100
26/06/22 04:56:20 WARN PipeUtilWriter: Writing output Pipe [name=null, format=parquet, preprocessors=null, props={path=/tmp/zingg_dir/app_models/100/model/block/zingg.block}]
26/06/22 04:56:20 WARN TaskSetManager: Stage 26 contains a task of very large size (1193 KiB). The maximum recommended task size is 1000 KiB.
26/06/22 04:56:20 INFO Trainer: Learnt indexing rules and saved output at /tmp/zingg_dir/app_models
26/06/22 04:56:20 INFO ModelUtil: Learning similarity rules
26/06/22 04:56:20 INFO ModelUtil: Start reading internal configurations and functions
26/06/22 04:56:20 INFO ModelUtil: Finished reading internal configurations and functions
26/06/22 04:56:22 WARN InstanceBuilder: Failed to load implementation from:dev.ludovic.netlib.blas.JNIBLAS
26/06/22 04:56:43 INFO Trainer: Learnt similarity rules and saved output at /tmp/zingg_dir/app_models
26/06/22 04:56:43 INFO Trainer: Finished Learning phase
26/06/22 04:56:43 WARN PipeUtilReader: Reading Pipe [name=app, format=csv, preprocessors=null, props={path=/tmp/zingg_dir/nobids_apps.csv, header=false, delimiter=,}]
26/06/22 04:56:47 INFO Matcher: Read 1116587
26/06/22 04:56:47 WARN Blocker: Blocking model location is Pipe [name=null, format=parquet, preprocessors=null, props={path=/tmp/zingg_dir/app_models/100/model/block/zingg.block}]
26/06/22 04:56:47 WARN PipeUtilReader: Reading Pipe [name=null, format=parquet, preprocessors=null, props={path=/tmp/zingg_dir/app_models/100/model/block/zingg.block}]
26/06/22 04:56:47 INFO Matcher: Blocked
26/06/22 04:56:48 INFO SparkModel: threshold while predicting is 0.5
26/06/22 04:56:48 WARN CacheManager: Asked to cache already cached data.
26/06/22 04:56:48 WARN DAGScheduler: Broadcasting large task binary with size 1229.9 KiB