example:
is there a way to add rules to the config? for example: fuzzy match except when it has na
it sorted itself out
ahh ok i just let it keep running
thanks for the response, i get the following after labeling:
24/10/08 18:53:07 INFO Client:
24/10/08 18:53:10 WARN PipeUtil: Reading Pipe [name=accounts, format=parquet, preprocessors=null, props={location=s3a://wktaana-etl-dev/first-degree/sfdc/account}]
24/10/08 18:53:10 WARN MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
24/10/08 18:53:32 WARN SparkStringUtils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'.
24/10/08 18:54:44 WARN TrainingDataFinder: Read input data 612943
24/10/08 18:54:44 WARN PipeUtil: Reading Pipe [name=null, format=parquet, preprocessors=null, props={location=/tmp/zingg/1/trainingData//marked/}]
24/10/08 18:54:45 WARN DSUtil: Read marked training samples
24/10/08 18:54:45 WARN DSUtil: No configured training samples
24/10/08 18:54:46 WARN TrainingDataFinder: Read training samples 0 neg 21
24/10/08 18:54:46 INFO TrainingDataFinder: Created positive sample pairs
24/10/08 18:54:46 WARN SimpleFunctionRegistry: The function removestopwordsudf replaced a previously registered function.
24/10/08 18:54:47 INFO TrainingDataFinder: Preprocessing DS for stopWords
24/10/08 18:54:47 WARN SimpleFunctionRegistry: The function removestopwordsudf replaced a previously registered function.
24/10/08 18:54:47 WARN SimpleFunctionRegistry: The function round replaced a previously registered function.
24/10/08 18:54:52 INFO Heuristics: **Block size **8 and total count was 5959
24/10/08 18:54:52 INFO Heuristics: Heuristics suggest 8
24/10/08 18:54:52 INFO BlockingTreeUtil: Learning indexing rules for block size 8
24/10/08 18:54:58 WARN CacheManager: Asked to cache already cached data.
24/10/08 18:54:59 INFO ModelUtil: Learning similarity rules
24/10/08 18:54:59 INFO ModelUtil: Start reading internal configurations and functions
24/10/08 18:54:59 INFO ModelUtil: Finished reading internal configurations and functions
24/10/08 18:54:59 WARN SimpleFunctionRegistry: The function affinegapsimilarityfunction replaced a previously registered function.
24/10/08 18:54:59 WARN SimpleFunctionRegistry: The function jarowinklerfunction replaced a previously registered function.
24/10/08 18:54:59 WARN SimpleFunctionRegistry: The function affinegapsimilarityfunction replaced a previously registered function.
24/10/08 18:54:59 WARN SimpleFunctionRegistry: The function jarowinklerfunction replaced a previously registered function.
24/10/08 18:54:59 WARN SimpleFunctionRegistry: The function affinegapsimilarityfunction replaced a previously registered function.
24/10/08 18:54:59 WARN SimpleFunctionRegistry: The function jarowinklerfunction replaced a previously registered function.
24/10/08 18:54:59 WARN SimpleFunctionRegistry: The function affinegapsimilarityfunction replaced a previously registered function.
24/10/08 18:54:59 WARN SimpleFunctionRegistry: The function jarowinklerfunction replaced a previously registered function.
24/10/08 18:54:59 WARN SimpleFunctionRegistry: The function affinegapsimilarityfunction replaced a previously registered function.
24/10/08 18:54:59 WARN SimpleFunctionRegistry: The function jarowinklerfunction replaced a previously registered function.
24/10/08 18:54:59 WARN SimpleFunctionRegistry: The function affinegapsimilarityfunction replaced a previously registered function.
24/10/08 18:54:59 WARN SimpleFunctionRegistry: The function jarowinklerfunction replaced a previously registered function.
24/10/08 18:54:59 WARN SimpleFunctionRegistry: The function affinegapsimilarityfunction replaced a previously registered function.
24/10/08 18:54:59 WARN SimpleFunctionRegistry: The function jarowinklerfunction replaced a previously registered function.
24/10/08 18:55:05 WARN InstanceBuilder: Failed to load implementation from:dev.ludovic.netlib.blas.JNIBLAS
24/10/08 18:55:07 ERROR LBFGS: Failure! Resetting history: breeze.optimize.FirstOrderException: Line search failed
24/10/08 18:55:14 ERROR LBFGS: Failure! Resetting history: breeze.optimize.FirstOrderException: Line search failed
24/10/08 18:55:19 ERROR LBFGS: Failure! Resetting history: breeze.optimize.FirstOrderException: Line search failed
24/10/08 18:55:23 ERROR LBFGS: Failure! Resetting history: breeze.optimize.FirstOrderException: Line search failed
24/10/08 18:55:27 ERROR LBFGS: Failure! Resetting history: breeze.optimize.FirstOrderException: Line search failed
24/10/08 18:55:28 ERROR LBFGS: Failure again! Giving up and returning. Maybe the objective is just poorly behaved?
24/10/08 18:55:29 ERROR LBFGS: Failure! Resetting history: breeze.optimize.FirstOrderException: Line search failed
24/10/08 18:55:30 ERROR LBFGS: Failure again! Giving up and returning. Maybe the objective is just poorly behaved?
24/10/08 18:55:31 ERROR LBFGS: Failure! Resetting history: breeze.optimize.FirstOrderException: Line search failed
24/10/08 18:55:32 ERROR LBFGS: Failure again! Giving up and returning. Maybe the objective is just poorly behaved?
24/10/08 18:55:34 ERROR LBFGS: Failure! Resetting history: breeze.optimize.FirstOrderException: Line search failed
24/10/08 18:55:34 ERROR LBFGS: Failure again! Giving up and returning. Maybe the objective is just poorly behaved?
i used findTrainingData - then i did the labeling, and had zero matches, is there a way to ask for more training data?