Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[multistage][bugfix] eliminate multiple exchanges #11882

Merged
merged 2 commits into from
Oct 30, 2023

Conversation

walterddr
Copy link
Contributor

@walterddr walterddr commented Oct 26, 2023

this fixes #11881

this rule will be merged into a more sophisticated rule that will be created in #11831

@Jackie-Jiang Jackie-Jiang added bugfix multi-stage Related to the multi-stage query engine labels Oct 26, 2023
@codecov-commenter
Copy link

codecov-commenter commented Oct 26, 2023

Codecov Report

Merging #11882 (e224211) into master (6853991) will decrease coverage by 0.03%.
The diff coverage is n/a.

@@             Coverage Diff              @@
##             master   #11882      +/-   ##
============================================
- Coverage     61.42%   61.40%   -0.03%     
+ Complexity     1147     1146       -1     
============================================
  Files          2375     2375              
  Lines        128501   128501              
  Branches      19846    19846              
============================================
- Hits          78936    78907      -29     
- Misses        43859    43887      +28     
- Partials       5706     5707       +1     
Flag Coverage Δ
custom-integration1 <0.01% <ø> (ø)
integration <0.01% <ø> (ø)
integration1 <0.01% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 34.80% <ø> (-26.59%) ⬇️
java-21 61.28% <ø> (+0.01%) ⬆️
skip-bytebuffers-false 34.84% <ø> (-26.58%) ⬇️
skip-bytebuffers-true 61.26% <ø> (+<0.01%) ⬆️
temurin 61.40% <ø> (-0.03%) ⬇️
unittests 61.40% <ø> (-0.03%) ⬇️
unittests1 46.63% <ø> (ø)
unittests2 27.56% <ø> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 11 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

"\n LogicalTableScan(table=[[a]])",
"\n PinotLogicalExchange(distribution=[hash[0]])",
"\n LogicalProject(col1=[$0], col3=[$2])",
"\n LogicalFilter(condition=[>($2, 10)])",
Copy link
Contributor

@xiangfu0 xiangfu0 Oct 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting for two filters, I think visitFilter in ServerPlanRequestVisitor will override each other, so one predicate will be skipped.

Copy link
Contributor Author

@walterddr walterddr Oct 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not unexpected. (we can potentially make it more smart). the problem is that

WITH tmp1
     AS (SELECT *
         FROM   a
         WHERE  col2 NOT IN ( 'foo', 'bar' )),
     tmp2
     AS (SELECT *
         FROM   b
         WHERE  col1 IN (SELECT col1
                         FROM   tmp1)
                AND col3 < 100)
SELECT *
FROM   tmp2
WHERE  col1 IN (SELECT col1
                FROM   tmp1  -- this is the tmp1 modification into tmp1'
                WHERE  col3 > 10) 

at this stage

  • Calcite's doesn't know if you want to do a table spool of tmp1 and make tmp1' derived from tmp1 --> thus col3 > 10 is not pushed down.
  • only when we decided that table spool is not possible and then we decided to copy tmp1 and tmp1' into 2 separate sub queries, at that time i think the hep optimizer already get passed the filter merging phase.

rewriting the query in this way

WITH tmp1
     AS (SELECT *
         FROM   a
         WHERE  col2 NOT IN ( 'foo', 'bar' )),
     tmp2
     AS (SELECT *
         FROM   b
         WHERE  col1 IN (SELECT col1
                         FROM   tmp1)
                AND col3 < 100),
     tmp3 -- here we explicitly tell that tmp3 and tmp1 are not related
     AS (SELECT *
         FROM   a
         WHERE  col2 NOT IN ( 'foo', 'bar' ) AND col3 > 10),
SELECT *
FROM   tmp2
WHERE  col1 IN (SELECT col1
                FROM   tmp3) 

produces no multi-filter plan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we cannot avoid this, then it means the ServerPlanRequestVisitor should handle the leaf query generation properly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a un-plannable sql before or after this bug fix PR. I would suggest differ the fix into a different PR (i will change the test to make sure it doesn't happen)

@walterddr walterddr merged commit 86735ce into apache:master Oct 30, 2023
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix multi-stage Related to the multi-stage query engine
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[multistage] multiple nested SEMI-join create wrong plan
4 participants