Troubleshooting Zingg Code Error Caused by Last Line in Notebook

Wayne F. · 2025-02-03T14:19:37.082Z

Looking at the Zingg code, this all looks legitimate, but that last line causes an error. If it matters, higher up in the notebook we have: from zingg.client import * from zingg.pipes import *

Zingg Community

14 comments

· Sorted by Oldest

Sonal G.
·
👋 Wayne F., great seeing the progress towards upgrade to the new version. The error indicates that the cluster can not find the Zingg jar. Is that set as a cluster library?
Wayne F.
·
WE updated it, and it still seems to not work.
Sonal G.
·
this si the python package, not the java jar Please download from our github and use it as a cluster library
Wayne F.
·
There are two files: one a JAR and one the Python. Is the JAR installed in the wrong place?
Wayne F.
·
(This is in the cluster Libraries tab, in the same place as the Zingg 0.3.4 JAR is placed.)
Wayne F.
·
I am not sure that the cluster was restarted after adding it, though, if that makes a difference.
Wayne F.
·
Restarted the cluster...

Wayne F.

OK, different error. Evidently mismatch with Java version? py4j-0.10.9.7 is mentioned and

zingg.common.client.ZinggClientException: org.apache.spark.sql.types.StringType$; local class incompatible: stream classdesc serialVersionUID = 3796071416192072411, local class serialVersionUID = 1779832429914676547

Wayne F.

openjdk version "1.8.0_412"
OpenJDK Runtime Environment (Zulu 8.78.0.19-CA-linux64) (build 1.8.0_412-b08)
OpenJDK 64-Bit Server VM (Zulu 8.78.0.19-CA-linux64) (build 25.412-b08, mixed mode)

Wayne F.

If it's helpful, the entire error message is:

Py4JJavaError: An error occurred while calling o488.execute.
: zingg.common.client.ZinggClientException: org.apache.spark.sql.types.StringType$; local class incompatible: stream classdesc serialVersionUID = 3796071416192072411, local class serialVersionUID = 1779832429914676547
	at zingg.common.core.executor.Matcher.execute(Matcher.java:136)
	at zingg.common.client.Client.execute(Client.java:251)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)
	at py4j.Gateway.invoke(Gateway.java:306)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:119)
	at java.lang.Thread.run(Thread.java:750)
File <command-226341090012288>, line 51
     49 print(f"   Options: '{options}'")
     50 zingg = ZinggWithSpark(args, options)
---> 51 rc = zingg.initAndExecute()
     52 print(f"   Zingg RC: '{rc}' at {time.ctime()}")
     54 print(f"   Read '{model_details.output_pipe_loc}' at {time.ctime()}")
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/zingg/client.py:144, in Zingg.initAndExecute(self)
    142         self.client.execute()
    143 else:
--> 144     self.client.execute()
File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1355, in JavaMember.__call__(self, *args)
   1349 command = proto.CALL_COMMAND_NAME +\
   1350     self.command_header +\
   1351     args_command +\
   1352     proto.END_COMMAND_PART
   1354 answer = self.gateway_client.send_command(command)
-> 1355 return_value = get_return_value(
   1356     answer, self.gateway_client, self.target_id, self.name)
   1358 for temp_arg in temp_args:
   1359     if hasattr(temp_arg, "_detach"):
File /databricks/spark/python/pyspark/errors/exceptions/captured.py:255, in capture_sql_exception.<locals>.deco(*a, **kw)
    252 from py4j.protocol import Py4JJavaError
    254 try:
--> 255     return f(*a, **kw)
    256 except Py4JJavaError as e:
    257     converted = convert_exception(e.java_exception)
File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
    324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
    325 if answer[1] == REFERENCE_TYPE:
--> 326     raise Py4JJavaError(
    327         "An error occurred while calling {0}{1}{2}.\n".
    328         format(target_id, ".", name), value)
    329 else:
    330     raise Py4JError(
    331         "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
    332         format(target_id, ".", name, value))

Sonal G.
·
Are you running match directly? As the build has changed, can you run train first with the build and then run match?
Wayne F.
·
So retrain on the newer cluster? We had been trying to reuse the original model to try to not change everything at once. (On another note: we did retrain the model on the older cluster, and training took 5x longer than it did previously, but the retrained model evidently runs more performantly -- as it used to. I was assured that nothing could have changed, but obviously something did.)
Wayne F.
·
Trying to figure out at what layer in Zingg a Spark String object is in Java, then being serialized, passed through something, and then being unserialized and we get this error. The underlying table is stored in Parquet, I believe, so it's not a serialized Java object, right? I notice mentions of "gateway" in some of the Java errors/code.
Sonal G.
·
The model binary created from the older build will not work with the newer one. So running the train phase again will ensure compatibility
👍1

Sonal G.
·
👋 Wayne F., great seeing the progress towards upgrade to the new version. The error indicates that the cluster can not find the Zingg jar. Is that set as a cluster library?
Wayne F.
·
WE updated it, and it still seems to not work.
Sonal G.
·
this si the python package, not the java jar Please download from our github and use it as a cluster library
Wayne F.
·
There are two files: one a JAR and one the Python. Is the JAR installed in the wrong place?
Wayne F.
·
(This is in the cluster Libraries tab, in the same place as the Zingg 0.3.4 JAR is placed.)
Wayne F.
·
I am not sure that the cluster was restarted after adding it, though, if that makes a difference.
Wayne F.
·
Restarted the cluster...

Wayne F.

OK, different error. Evidently mismatch with Java version? py4j-0.10.9.7 is mentioned and

zingg.common.client.ZinggClientException: org.apache.spark.sql.types.StringType$; local class incompatible: stream classdesc serialVersionUID = 3796071416192072411, local class serialVersionUID = 1779832429914676547

Wayne F.

openjdk version "1.8.0_412"
OpenJDK Runtime Environment (Zulu 8.78.0.19-CA-linux64) (build 1.8.0_412-b08)
OpenJDK 64-Bit Server VM (Zulu 8.78.0.19-CA-linux64) (build 25.412-b08, mixed mode)

Wayne F.

If it's helpful, the entire error message is:

Py4JJavaError: An error occurred while calling o488.execute.
: zingg.common.client.ZinggClientException: org.apache.spark.sql.types.StringType$; local class incompatible: stream classdesc serialVersionUID = 3796071416192072411, local class serialVersionUID = 1779832429914676547
	at zingg.common.core.executor.Matcher.execute(Matcher.java:136)
	at zingg.common.client.Client.execute(Client.java:251)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)
	at py4j.Gateway.invoke(Gateway.java:306)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:119)
	at java.lang.Thread.run(Thread.java:750)
File <command-226341090012288>, line 51
     49 print(f"   Options: '{options}'")
     50 zingg = ZinggWithSpark(args, options)
---> 51 rc = zingg.initAndExecute()
     52 print(f"   Zingg RC: '{rc}' at {time.ctime()}")
     54 print(f"   Read '{model_details.output_pipe_loc}' at {time.ctime()}")
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/zingg/client.py:144, in Zingg.initAndExecute(self)
    142         self.client.execute()
    143 else:
--> 144     self.client.execute()
File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1355, in JavaMember.__call__(self, *args)
   1349 command = proto.CALL_COMMAND_NAME +\
   1350     self.command_header +\
   1351     args_command +\
   1352     proto.END_COMMAND_PART
   1354 answer = self.gateway_client.send_command(command)
-> 1355 return_value = get_return_value(
   1356     answer, self.gateway_client, self.target_id, self.name)
   1358 for temp_arg in temp_args:
   1359     if hasattr(temp_arg, "_detach"):
File /databricks/spark/python/pyspark/errors/exceptions/captured.py:255, in capture_sql_exception.<locals>.deco(*a, **kw)
    252 from py4j.protocol import Py4JJavaError
    254 try:
--> 255     return f(*a, **kw)
    256 except Py4JJavaError as e:
    257     converted = convert_exception(e.java_exception)
File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
    324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
    325 if answer[1] == REFERENCE_TYPE:
--> 326     raise Py4JJavaError(
    327         "An error occurred while calling {0}{1}{2}.\n".
    328         format(target_id, ".", name), value)
    329 else:
    330     raise Py4JError(
    331         "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
    332         format(target_id, ".", name, value))

Sonal G.
·
Are you running match directly? As the build has changed, can you run train first with the build and then run match?
Wayne F.
·
So retrain on the newer cluster? We had been trying to reuse the original model to try to not change everything at once. (On another note: we did retrain the model on the older cluster, and training took 5x longer than it did previously, but the retrained model evidently runs more performantly -- as it used to. I was assured that nothing could have changed, but obviously something did.)
Wayne F.
·
Trying to figure out at what layer in Zingg a Spark String object is in Java, then being serialized, passed through something, and then being unserialized and we get this error. The underlying table is stored in Parquet, I believe, so it's not a serialized Java object, right? I notice mentions of "gateway" in some of the Java errors/code.
Sonal G.
·
The model binary created from the older build will not work with the newer one. So running the train phase again will ensure compatibility
👍1