-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[multistage][bugfix] eliminate multiple exchanges #11882
Conversation
7f2962b
to
ce4a8eb
Compare
Codecov Report
@@ Coverage Diff @@
## master #11882 +/- ##
============================================
- Coverage 61.42% 61.40% -0.03%
+ Complexity 1147 1146 -1
============================================
Files 2375 2375
Lines 128501 128501
Branches 19846 19846
============================================
- Hits 78936 78907 -29
- Misses 43859 43887 +28
- Partials 5706 5707 +1
Flags with carried forward coverage won't be shown. Click here to find out more. see 11 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
"\n LogicalTableScan(table=[[a]])", | ||
"\n PinotLogicalExchange(distribution=[hash[0]])", | ||
"\n LogicalProject(col1=[$0], col3=[$2])", | ||
"\n LogicalFilter(condition=[>($2, 10)])", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting for two filters, I think visitFilter in ServerPlanRequestVisitor will override each other, so one predicate will be skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not unexpected. (we can potentially make it more smart). the problem is that
WITH tmp1
AS (SELECT *
FROM a
WHERE col2 NOT IN ( 'foo', 'bar' )),
tmp2
AS (SELECT *
FROM b
WHERE col1 IN (SELECT col1
FROM tmp1)
AND col3 < 100)
SELECT *
FROM tmp2
WHERE col1 IN (SELECT col1
FROM tmp1 -- this is the tmp1 modification into tmp1'
WHERE col3 > 10)
at this stage
- Calcite's doesn't know if you want to do a table spool of tmp1 and make tmp1' derived from tmp1 --> thus col3 > 10 is not pushed down.
- only when we decided that table spool is not possible and then we decided to copy tmp1 and tmp1' into 2 separate sub queries, at that time i think the hep optimizer already get passed the filter merging phase.
rewriting the query in this way
WITH tmp1
AS (SELECT *
FROM a
WHERE col2 NOT IN ( 'foo', 'bar' )),
tmp2
AS (SELECT *
FROM b
WHERE col1 IN (SELECT col1
FROM tmp1)
AND col3 < 100),
tmp3 -- here we explicitly tell that tmp3 and tmp1 are not related
AS (SELECT *
FROM a
WHERE col2 NOT IN ( 'foo', 'bar' ) AND col3 > 10),
SELECT *
FROM tmp2
WHERE col1 IN (SELECT col1
FROM tmp3)
produces no multi-filter plan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we cannot avoid this, then it means the ServerPlanRequestVisitor should handle the leaf query generation properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a un-plannable sql before or after this bug fix PR. I would suggest differ the fix into a different PR (i will change the test to make sure it doesn't happen)
ce4a8eb
to
e224211
Compare
this fixes #11881
this rule will be merged into a more sophisticated rule that will be created in #11831