Troubleshooting Schema Inference and Empty DataFrame Issue When Reading CSV in Databricks with Zingg Output
Hello there team and thanks a ton for building Zingg! What an amazing tool:) I managed to install it through UC in databricks and it works! It also seems to learn after each training set of labels I input..which is a good sign.. However, when I try to predict Matching Records in the end: outputDF = spark.read.csv("dbfs:/FileStore/zingg/zingg05Trial29Sep_2_cleanPhone", schema_out) colNames = ["z_minScore", "z_maxScore", "z_cluster", 'CompanyID', 'CompanyName', 'Email', 'Phone', 'Address1', 'LastName', 'FirstName', 'Address2', 'Country'] outputDF.toDF(*colNames).show(100) I get: [[UNABLE_TO_INFER_SCHEMA](https://learn.microsoft.com/azure/databricks/error-messages/error-classes#unable_to_infer_schema)] Unable to infer schema for CSV. It must be specified manually. SQLSTATE: 42KD9 File <command-8616479842645297>, line 2 ----> 2 outputDF = spark.read.csv("dbfs:/FileStore/zingg/zingg05Trial29Sep_2_cleanPhone") So i do: schema_out = "z_minScore float, z_maxScore float, z_cluster int, CompanyName string, CompanyID string, FirstName string, LastName string, Email string, Phone string, Country string, Address1 string, Address2 string" outputDF = spark.read.csv("dbfs:/FileStore/zingg/zingg05Trial29Sep_2_cleanPhone") but then i get an empty DF: +----------+----------+---------+---------+-----------+-----+-----+--------+--------+---------+--------+-------+ |z_minScore|z_maxScore|z_cluster|CompanyID|CompanyName|Email|Phone|Address1|LastName|FirstName|Address2|Country| +----------+----------+---------+---------+-----------+-----+-----+--------+--------+---------+--------+-------+ +----------+----------+---------+---------+-----------+-----+-----+--------+--------+---------+--------+-------+ please help me on what might have gone wrong here..