Hey All,
I'm currently working with large tables stored in Hive. When using the interactive interface to explore the data, the process takes quite a long time, even though the environment is distributed. Additionally, when I try to use a sample, the dataset tends to be imbalanced, which prevents me from proceeding to the training phase.
Do you have any suggestions on how to address this issue? Also, is there a way to bypass the interactive interface and move directly to the training phase?
PS: Iβm currently working with Zingg version 3.4
Thank you in advance.