-
Notifications
You must be signed in to change notification settings - Fork 654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SEDONA-191] Support join types other than inner in BroadcastIndexJoinExec #711
Conversation
This requires more testing before it can be merged. Just wanted to get the draft up for some possible feedback and to run the github actions. |
extraCondition: Option[Expression], | ||
distance: Option[Expression]): Seq[SparkPlan] = { | ||
|
||
val broadcastSide = joinType match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll cover LeftExistence
in a followup - it's more tedious to implement and test.
|
||
// Using UDFs rather than lit prevents optimizations that would circumvent the checks we want to test | ||
val one = udf(() => 1) | ||
val two = udf(() => 2) | ||
val one = udf(() => 1).asNondeterministic() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added nondeterminist so the optimizer could modify the plans even less
@Kimahriman you introduced broadcast joins here, perhaps you are the best person to take a look at this improvement. |
@Kimahriman Adam, would you please share some insights about this? |
Yeah I peaked at this when it first was put up, I'll look through again with the latest updates |
sql/src/main/scala/org/apache/spark/sql/sedona_sql/strategy/join/JoinQueryDetector.scala
Show resolved
Hide resolved
Looks good to me. Only other comment would be if the tests could be parameterized at all so to dedupe some of it, and maybe verify the schema and at least one row of data to make sure the output is correct and we don't run into a similar problem as #706 |
I'll improve the test coverage in few days |
Added additional tests in 2624db4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a minor thing. Would you please update the doc [1] to cover the content introduced in this PR?
[1] https://sedona.apache.org/api/sql/Optimizer/#broadcast-join
Thanks for your contribution! |
Did you read the Contributor Guide?
Is this PR related to a JIRA ticket?
[SEDONA-XXX] my subject
.What changes were proposed in this PR?
Extended the
BroadcastIndexJoinExec
to support left semi, left anti, left outer and right outer joins if there is a broadcast hint on correct side. Currently those joins are run with SparksBroadcastNestedLoopJoinExec
, that can be much slower.A lot of the code is based on
org.apache.spark.sql.execution.joins.HashJoin
.How was this patch tested?
New unittests
Did this PR include necessary documentation updates?