I have 700k records, and running findTrainingData phase takes like > 40 minutes. I have given it 32g of ram. It is also failing after the second "findTrainingData" with the message in thread below
24/04/04 22:12:39 ERROR LBFGS: Failure! Resetting history: breeze.optimize.FirstOrderException: Line search failed
24/04/04 22:12:39 ERROR LBFGS: Failure again! Giving up and returning. Maybe the objective is just poorly behaved?
24/04/04 22:12:49 ERROR LBFGS: Failure! Resetting history: breeze.optimize.FirstOrderException: Line search failed
24/04/04 22:12:50 ERROR LBFGS: Failure again! Giving up and returning. Maybe the objective is just poorly behaved?
24/04/04 22:12:55 ERROR LBFGS: Failure! Resetting history: breeze.optimize.FirstOrderException: Line search failed
24/04/04 22:12:56 ERROR LBFGS: Failure again! Giving up and returning. Maybe the objective is just poorly behaved?
24/04/04 22:13:01 ERROR LBFGS: Failure! Resetting history: breeze.optimize.FirstOrderException: Line search failed
24/04/04 22:13:02 ERROR LBFGS: Failure again! Giving up and returning. Maybe the objective is just poorly behaved?
changing labelDataSampleSize from .5 to .08 made it way faster, and worked correctly the second time.
Why do all the examples have .5 as the default label value but the docs say between .0001 and .1?
Glad you figured it out! The examples mostly refer to febrl dataset which has only 65 rows, hence needs a higher value. Apologies if that is causing a confusion