Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SEDONA-191] Support join types other than inner in BroadcastIndexJoinExec #711

Merged
merged 14 commits into from
Nov 26, 2022

Conversation

tanelk
Copy link
Contributor

@tanelk tanelk commented Nov 11, 2022

Did you read the Contributor Guide?

Is this PR related to a JIRA ticket?

What changes were proposed in this PR?

Extended the BroadcastIndexJoinExec to support left semi, left anti, left outer and right outer joins if there is a broadcast hint on correct side. Currently those joins are run with Sparks BroadcastNestedLoopJoinExec, that can be much slower.

A lot of the code is based on org.apache.spark.sql.execution.joins.HashJoin.

How was this patch tested?

New unittests

Did this PR include necessary documentation updates?

  • No, this PR does not affect any public API so no need to change the docs.

@tanelk
Copy link
Contributor Author

tanelk commented Nov 11, 2022

This requires more testing before it can be merged. Just wanted to get the draft up for some possible feedback and to run the github actions.

@tanelk tanelk changed the title [WIP][SEDONA-191] Support join types other than inner in BroadcastIndexJoinExec [SEDONA-191] Support join types other than inner in BroadcastIndexJoinExec Nov 18, 2022
@tanelk tanelk marked this pull request as ready for review November 18, 2022 15:17
extraCondition: Option[Expression],
distance: Option[Expression]): Seq[SparkPlan] = {

val broadcastSide = joinType match {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll cover LeftExistence in a followup - it's more tedious to implement and test.


// Using UDFs rather than lit prevents optimizations that would circumvent the checks we want to test
val one = udf(() => 1)
val two = udf(() => 2)
val one = udf(() => 1).asNondeterministic()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added nondeterminist so the optimizer could modify the plans even less

@tanelk
Copy link
Contributor Author

tanelk commented Nov 18, 2022

@Kimahriman you introduced broadcast joins here, perhaps you are the best person to take a look at this improvement.
Also pinging @jiayuasu - I think this is ready for review now

@jiayuasu
Copy link
Member

@Kimahriman Adam, would you please share some insights about this?

@Kimahriman
Copy link
Contributor

Yeah I peaked at this when it first was put up, I'll look through again with the latest updates

@Kimahriman
Copy link
Contributor

Looks good to me. Only other comment would be if the tests could be parameterized at all so to dedupe some of it, and maybe verify the schema and at least one row of data to make sure the output is correct and we don't run into a similar problem as #706

@tanelk
Copy link
Contributor Author

tanelk commented Nov 20, 2022

Looks good to me. Only other comment would be if the tests could be parameterized at all so to dedupe some of it, and maybe verify the schema and at least one row of data to make sure the output is correct and we don't run into a similar problem as #706

I'll improve the test coverage in few days

@tanelk
Copy link
Contributor Author

tanelk commented Nov 23, 2022

Added additional tests in 2624db4

@jiayuasu jiayuasu added this to the sedona-1.3.1 milestone Nov 23, 2022
Copy link
Member

@jiayuasu jiayuasu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor thing. Would you please update the doc [1] to cover the content introduced in this PR?

[1] https://sedona.apache.org/api/sql/Optimizer/#broadcast-join

@tanelk tanelk requested a review from jiayuasu November 25, 2022 16:59
@jiayuasu
Copy link
Member

Thanks for your contribution!

@jiayuasu jiayuasu merged commit 8489b2d into apache:master Nov 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants