@sonalgoyal When running the code from https://github.com/zinggAI/zingg/blob/main/examples/databricks/FebrlExample.ipynb note books using Zingg 0.4.0 and DBR 14.2, We are getting the below error in cell 25 while trying to run the 'findTrainingData'. also commands are missing in cell 10 and 11 in notebook. Kindly advise on that as well. Error: Py4JJavaError Traceback (most recent call last) File , line 5 3 #Zingg execution for the given phase 4 zingg = ZinggWithSpark(args, options) ----> 5 zingg.initAndExecute() File /local_disk0/.ephemeral_nfs/envs/pythonEnv-4f0b1aa6-f6e0-4f4f-bc11-9720d8a947d7/lib/python3.10/site-packages/zingg/client.py:144, in Zingg.initAndExecute(self) 142 self.client.execute() 143 else: --> 144 self.client.execute()
Sorry I cant find the error in the message above. Can you please tell me the error? Also, is Zingg 0.4.0 jar loaded as a library on the cluster?
Sonal G. yes, jar is already installed on cluster. It seems that while zingg.initAndExecute() method is running the code is breaking. Attaching the complete error message FYI. java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.encoders.RowEncoder.apply(Lorg/apache/spark/sql/types/StructType;)Lorg/apache/spark/sql/catalyst/encoders/ExpressionEncoder; --------------------------------------------------------------------------- Py4JJavaError Traceback (most recent call last) File <command-988354905897275>, line 5 3 #Zingg execution for the given phase 4 zingg = ZinggWithSpark(args, options) ----> 5 zingg.initAndExecute() File /local_disk0/.ephemeral_nfs/envs/pythonEnv-5f839110-a386-494b-8d14-62aa45cbe987/lib/python3.10/site-packages/zingg/client.py:144, in Zingg.initAndExecute(self) 142 self.client.execute() 143 else: --> 144 self.client.execute() File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1355, in JavaMember.__call__(self, *args) 1349 command = proto.CALL_COMMAND_NAME +\ 1350 self.command_header +\ 1351 args_command +\ 1352 proto.END_COMMAND_PART 1354 answer = self.gateway_client.send_command(command) -> 1355 return_value = get_return_value( 1356 answer, self.gateway_client, self.target_id, self.name) 1358 for temp_arg in temp_args: 1359 if hasattr(temp_arg, "_detach"): File /databricks/spark/python/pyspark/errors/exceptions/captured.py:188, in capture_sql_exception.<locals>.deco(*a, **kw) 186 def deco(*a: Any, **kw: Any) -> Any: 187 try: --> 188 return f(*a, **kw) 189 except Py4JJavaError as e: 190 converted = convert_exception(e.java_exception) File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name) 324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client) 325 if answer[1] == REFERENCE_TYPE: --> 326 raise Py4JJavaError( 327 "An error occurred while calling {0}{1}{2}.\n". 328 format(target_id, ".", name), value) 329 else: 330 raise Py4JError( 331 "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n". 332 format(target_id, ".", name, value)) Py4JJavaError: An error occurred while calling o458.execute. : java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.encoders.RowEncoder.apply(Lorg/apache/spark/sql/types/StructType;)Lorg/apache/spark/sql/catalyst/encoders/ExpressionEncoder; at zingg.spark.core.util.SparkBlockingTreeUtil.getBlockHashes(SparkBlockingTreeUtil.java:46) at zingg.common.core.executor.TrainingDataFinder.execute(TrainingDataFinder.java:83) at zingg.common.client.Client.execute(Client.java:246) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397) at py4j.Gateway.invoke(Gateway.java:306) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199) at py4j.ClientServerConnection.run(ClientServerConnection.java:119) at java.lang.Thread.run(Thread.java:750)
Are you not on dbr 14.2?
We are using 14.2 DBR
And it says spark 3.5.0?
yes,it says 3.5.0
The RowEncoder call to spark was removed in Zingg 0.4.0. if you are seeing that, you may like to verify that Zingg is the correct version and you have only one version of Zingg jar installed on your cluster.