Zingg Analysis: Ambiguous Reference to ID Column Causing No Score Table Output After Retraining Models

Wayne F. · 2025-02-14T17:26:27.748Z

OK, we relabeled and retrained both of our models and are beyond the Java errors. (I'd also add that 0.4.0 seems to be improved in the labeling step.) Now I'm not getting an error, but Zingg is also not outputting the score table. No indications in the notebook running Zingg, but looking at the cluster stderr I see: org.apache.spark.sql.AnalysisException: [AMBIGUOUS_REFERENCE] Reference `id` is ambiguous, could be: [`id`, `id`]. SQLSTATE: 42704 We do in fact have a column ID in our input: - ('ID', 'string', 'DONT_USE') so is that a reserved column name and might Zingg be adding another ID in its processing? (Spark or Databricks wants to lowercase column names so there's no difference between ID and id.)

Zingg Community

11 comments

· Sorted by Oldest

Wayne F.

So I renamed the ID column and changed the parameters on the fly, and now get an error similar to what I was getting in the past:

Py4JJavaError: An error occurred while calling o1058.execute.
: java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)Lscala/collection/mutable/ArrayOps;
	at org.graphframes.lib.ConnectedComponents$.org$graphframes$lib$ConnectedComponents$$run(ConnectedComponents.scala:277)
	at org.graphframes.lib.ConnectedComponents.run(ConnectedComponents.scala:154)
	at zingg.spark.core.util.SparkGraphUtil.buildGraph(SparkGraphUtil.java:39)
	at zingg.common.core.executor.Matcher.getOutput(Matcher.java:175)
	at zingg.common.core.executor.Matcher.writeOutput(Matcher.java:151)
	at zingg.common.core.executor.Matcher.execute(Matcher.java:131)
	at zingg.common.client.Client.execute(Client.java:251)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)
	at py4j.Gateway.invoke(Gateway.java:306)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:119)
	at java.lang.Thread.run(Thread.java:750)
File <command-226341090012288>, line 59
     57 # rc = zingg.initAndExecute()
     58 rc = zingg.init()
---> 59 rc = zingg.execute()

Wayne F.
·
Which sounds suspiciously like something going wrong while using GraphFrames to do Connected Component detection.
Wayne F.
·
So it's not clear whether I allowed things to go further by renaming ID to ID_WAYNE or what. Sonal G.
Wayne F.
·
%pip install zingg==0.4.0 pyspark==3.1.2 py4j==0.10.9 pyyaml
Sonal G.
·
Wayne F. Zingg 0.4.0 needs Spark 3.5
Sonal G.
·
is that what is available on your cluster? The pyspark version you just sent points to an older version 🤔
Wayne F.
·
OK, I was following documentation or error messages. THe current, I think, is 3.5.1, so maybe it's restricting that.
Wayne F.
·
Sonal G.
The conflict is caused by: The user requested py4j zingg 0.4.0 depends on py4j==0.10.9 pyspark 3.5.0 depends on py4j==0.10.9.7
Wayne F.
·
%pip install zingg==0.4.0 pyspark==3.5 py4j pyyaml
Wayne F.
·
%pip install zingg==0.4.0 pyspark==3.5 pyyaml

The conflict is caused by: zingg 0.4.0 depends on py4j==0.10.9 pyspark 3.5.0 depends on py4j==0.10.9.7
Wayne F.
·
If I just specify no versions, PIP installs Pyspark 3.1.3, which is essentially what I did by hand, above:
%pip install zingg pyspark pyyaml
yields:
Successfully installed py4j-0.10.9 pyspark-3.1.3 zingg-0.4.0

Wayne F.

So I renamed the ID column and changed the parameters on the fly, and now get an error similar to what I was getting in the past:

Py4JJavaError: An error occurred while calling o1058.execute.
: java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)Lscala/collection/mutable/ArrayOps;
	at org.graphframes.lib.ConnectedComponents$.org$graphframes$lib$ConnectedComponents$$run(ConnectedComponents.scala:277)
	at org.graphframes.lib.ConnectedComponents.run(ConnectedComponents.scala:154)
	at zingg.spark.core.util.SparkGraphUtil.buildGraph(SparkGraphUtil.java:39)
	at zingg.common.core.executor.Matcher.getOutput(Matcher.java:175)
	at zingg.common.core.executor.Matcher.writeOutput(Matcher.java:151)
	at zingg.common.core.executor.Matcher.execute(Matcher.java:131)
	at zingg.common.client.Client.execute(Client.java:251)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)
	at py4j.Gateway.invoke(Gateway.java:306)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:119)
	at java.lang.Thread.run(Thread.java:750)
File <command-226341090012288>, line 59
     57 # rc = zingg.initAndExecute()
     58 rc = zingg.init()
---> 59 rc = zingg.execute()

Wayne F.
·
Which sounds suspiciously like something going wrong while using GraphFrames to do Connected Component detection.
Wayne F.
·
So it's not clear whether I allowed things to go further by renaming ID to ID_WAYNE or what. Sonal G.
Wayne F.
·
%pip install zingg==0.4.0 pyspark==3.1.2 py4j==0.10.9 pyyaml
Sonal G.
·
Wayne F. Zingg 0.4.0 needs Spark 3.5
Sonal G.
·
is that what is available on your cluster? The pyspark version you just sent points to an older version 🤔
Wayne F.
·
OK, I was following documentation or error messages. THe current, I think, is 3.5.1, so maybe it's restricting that.
Wayne F.
·
Sonal G.
The conflict is caused by: The user requested py4j zingg 0.4.0 depends on py4j==0.10.9 pyspark 3.5.0 depends on py4j==0.10.9.7
Wayne F.
·
%pip install zingg==0.4.0 pyspark==3.5 py4j pyyaml
Wayne F.
·
%pip install zingg==0.4.0 pyspark==3.5 pyyaml

The conflict is caused by: zingg 0.4.0 depends on py4j==0.10.9 pyspark 3.5.0 depends on py4j==0.10.9.7
Wayne F.
·
If I just specify no versions, PIP installs Pyspark 3.1.3, which is essentially what I did by hand, above:
%pip install zingg pyspark pyyaml
yields:
Successfully installed py4j-0.10.9 pyspark-3.1.3 zingg-0.4.0