How to convert a manual CSV schema script to use MongoDB pipeline with included JAR?

Brandon M. · 2024-05-04T03:47:48.723Z

Hi friends, trying to convert my manual running csv's into an actual python script. schema = "first_name string,last_name string,phone string,url string,title string,department string,al string,Persona string,generated_email0 string,generated_email1 string,generated_email2 string,generic_email string,email string,entity_pursuit_id string,display_name string,state_abbr string,lsadc string,sum_lev string" inputPipe = CsvPipe("testFebrl", "cleargov.csv", schema) args.setData(inputPipe) How would I convert this to use mongodb pipe? I've included the jar.

Zingg Community

Sonal G.
·
You should be able to use Pipe and set the appropriate properties as mentioned in the docs https://docs.zingg.ai/zingg0.4.0/connectors/mongodb
Brandon M.
·
Thanks Sonal G. -- still trying to figure out how to get this to work, but, figured i'd ask in advanced. How do I add a query? From reading mongodb documents it looks like you'd have to add a .filter() after the spark.load() call.
Brandon M.
·
have yall ever actually tested using mongodb? way more involved than just passing in that data blob
Brandon M.
·
mongo's spark version only works w/ spark 3.1 through 3.2.4

Brandon M.

they just released a new spark mongo version and getting

contacts-1  | : java.lang.NoSuchMethodError: 'org.apache.spark.sql.catalyst.encoders.ExpressionEncoder org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.resolveAndBind(scala.collection.immutable.Seq, org.apache.spark.sql.catalyst.analysis.Analyzer)'

Sonal G.
·
Brandon M. We have not recently run Zingg againat Mongo. One option for you is to look at Zingg’s InMemory pipe. The second is to continue to sue the Pipe and send in any values to be configured through option as a prop in zZngg Pipe. (pipe.setProp("option"", "val")
Sonal G.
·
The erro you shared is a version mismtach error. Zingg 0.4.0 needs Spark 3.5. You can try with an older Zingg release(0.3.4) and see what happens
Brandon M.
·
Yeah Sonal G. -- I've gathered that much, been really struggling to learn spark haha https://jira.mongodb.org/browse/SPARK-413 They say it supports 3.5 but I think it's a bug?
Brandon M.
·
Sonal G. if I wanted to run the "filter" option on the mongodb connector, which, happens after .load() on the spark pipe. Are you suggesting to use an InMemory Pipe?
Brandon M.
·
Sonal G.
·
Yes you can build the df using standard spark transforms and then send it to Zingg using InMemoryPipe
Brandon M.
·
Sonal G. do you think you could point me in the right direction
Brandon M.
·
with some code?
Brandon M.
·
Sonal G. sorry to bug you. I really want to get this to work. it's critical for my company to put this in production and we use mongodb, and, it's not easy lol. (for me)
Brandon M.
·
I believe I've got the mongodb pipe working, but, not sure how to run a filter on it. Would that be something like getDfFromDs and then run a .filter? OR how would I "the df using standard spark transforms"?

How to convert a manual CSV schema script to use MongoDB pipeline with included JAR?

52 comments

How to convert a manual CSV schema script to use MongoDB pipeline with included JAR?

52 comments