Hi all,
We are attempting to run Zingg 0.4.0 on the newly released AWS Glue 5.0, but we are encountering issues with the Zingg pipes, specifically after the trainMatch phase, during access to the output pipe. The logs indicate that, with large datasets, the workers fail to communicate with the driver, causing the SparkSession to terminate.
We have been trying to fine-tune various Spark parameters but have not had any success so far. It appears this may be related to the Spark setup, although these issues did not occur on Glue 4.0 when using the package compiled for Spark 3.3.x.
It would be great to hear if anyone in the community has faced similar issues with Glue 5.0 and managed to resolve them. Thank you.