We are running zing OSS in Fabric. Everything was working great, until just recently, something looks like to changed either with fabric or with zingg over the past week. All of a sudden the matching phase takes forever to run, and eventually “completes”, but does not write any files to checkpoint or output to the output pipe. It fails silently, and no indications in any log file why. We have tested access and permissions on our side, we are only running 15,000 records. Is anyone else running into issues with Fabric?? We are running fabric runtime 1.3, spark 3.5, delta 3.2 and zingg version 0.5.0 Anyone else running into this issue specifically with match phase?
That's weird. What version of Spark you're using? Could you send us a threaddump of the driver and one of the executors with active tasks running?
sorry, I saw you already included the version
Hi Angel thanks for replying. I tried on multiple occasions to get the thread dumps but the spark UI becomes unavailable when the match notebook is active and running. I did download the event logs after the session is disconnected, they are about 3GBs in size and don't really indicate any glaring issues. Any other options or ideas? I did log a ticket with Microsoft, as this was working fine before their November release. No update from them Yet.
You mentioned a notebook. Are you using PySpark?
Yes pyspark
Most likely it’s not going to provide any clues, but what about spawning a thread in the notebook to get the Python threads’ stack traces? I wrote about how to do it in this article: https://blog.devgenius.io/apache-spark-wtf-written-in-pyspark-34591c5f32bf
Hey just wanted to post here as well that we resolved this. This is due to a spark issue in the latest fabric release where large physical plans can cause out of memory issues when using adaptive query execution. They will fix in future releases. Workaround: Use this at the beginning of your code especially when matching large and complex datasets. Without this, match fails every time for us now. spark.conf.set("spark.sql.adaptive.enabled", "false")