Skip to main content

Zingg Community Icon

Zingg Community

Home
Events
Members

General
Help Zingg
Introduce Yourself
Zingg Databricks
Zingg Snowflake

⭐ Star Zingg
🐞 Submit an Issue
📚 Documentation

Powered by Tightknit

How to Link Two Datasets Using PySpark on Databricks: Syntax and Examples | Zingg Community

How to Link Two Datasets Using PySpark on Databricks: Syntax and Examples

·Apr 24, 2024 07:49 PM

What is the syntax to link two datasets? Preferably in terms of pyspark on databricks (but I’ll take what I can get). I’ve seen some references about this feature in this channel as well as page in the docs, but the specifics are a bit beyond me.

5 comments

· Sorted by Oldest

Jesse S.
·
Well of course I answer my own question after finally giving up and asking. This post has a code snippet that shows that we can provide multiple input pipes to the the data field, which I didn’t realize.

·

Just for the sake of sharing and searchability, here is a code snippet for running linking between 3 data sources:

from zingg.client import *
from zingg.pipes import *


args = Arguments()
args.setModelId(modelId)
args.setZinggDir(zinggDir)

df = spark.read.option("header", "true").csv(source_path).select("FirstName", "LastName")
df1 = df.limit(1000)
df2 = df.limit(100)
df3 = df.limit(200)
inputPipe1 = InMemoryPipe("source1", df1)
inputPipe2 = InMemoryPipe("source2", df2)
inputPipe3 = InMemoryPipe("source3", df3)


outputPipe = Pipe(name="output", format="delta")
outputPipe.addProperty("path", output_path)

args.setData(inputPipe1, inputPipe2, inputPipe3)
args.setOutput(outputPipe)

field_FirstName = FieldDefinition("FirstName", "string", MatchType.FUZZY)
field_LastName = FieldDefinition("LastName", "string", MatchType.FUZZY)
fieldDefs = [field_FirstName, field_LastName]
args.setFieldDefinition(fieldDefs)

options = ClientOptions([ClientOptions.PHASE, "link"])
zingg = ZinggWithSpark(args, options)
zingg.initAndExecute()

Feel free to let me know if something about this can be done better.

Vikas G.
·
you are right, link is the way to go
Vikas G.
·
more details : https://docs.zingg.ai/zingg/stepbystep/link
Sonal G.
·
Thanks for sharing Jesse S.! If you want to submit a pull request for the examples, that will be great too.