Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify not range filters without negating subexpressions #15766

Merged
merged 2 commits into from
Feb 6, 2024

Conversation

kgyrtkirk
Copy link
Member

@kgyrtkirk kgyrtkirk commented Jan 26, 2024

Earlier betweenish (range/bounds) filters were identified thru a process of negating the subexpressions which may have not performed that well. (it could have dominated the runtime in some cases) This patch makes that unnecessary as its able to create the negate expression directly.

This behaviour was observed by @gargvishesh in PR-15688

Fixes #XXXX.

Description


Key changed/added classes in this PR
  • CombineAndSimplifyBounds

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.

Earlier betweenish (range/bounds) filters were identified thru
a process of negating the subexpressions which may have not performed that well.
(it could have dominated the runtime in some cases)
This patch makes that unnecessary as its able to create the negate expression directly.
@@ -270,6 +269,18 @@ private static DimFilter doSimplify(final List<DimFilter> children, boolean disj
childrenToAdd.add(Ranges.toFilter(rangeRefKey, range));
}
}
} else {
if (disjunction && rangeSet.asRanges().size() == 2) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this approach generate a good plan (two range filters and'ed together) if there are two NOT BETWEEN? Like,

SELECT COUNT(*)
FROM foo
WHERE
  dim1 NOT BETWEEN 'a' AND 'c'
  AND dim1 NOT BETWEEN 'e' AND 'g'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

originally I just wanted to pass all the tests :D

...it doesn't handle that - but it does handle cases better when at the same level there are candidates for BETWEEN and for NOT BETWEEN as well

I'll think about it a bit and update the patch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could check the complement whenever the range set span covers all(), even if there's more than two ranges?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was making a comment - but didn't finished it:

for the conditions

  dim1 NOT BETWEEN 'a' AND 'c'  AND dim1 NOT BETWEEN 'e' AND 'g'

the equiv non-between form is like this:

(dim1 < 'a' or c < dim1 ) AND ( dim1 < 'e' OR 'g' < dim1 )

because it has 2 OR leafs which would be handled - its ok.

the more complicated case could be:

x < -10 or ( x between 0 and 10 )

for which there is no need to add not between stuff - it won't be much better => if it doesn't span all there is no need for those things - as the negated ranges will not be that much better

for something like

x < -10 or ( x between 0 and 10 ) or 100 < x

it could possibly be rewritten to

not ( x between -10 and 0 ) and not (x between 10 and 100)
not ( ( x between -10 and 0 ) or  (x between 10 and 100) )

I've added some stuff - but since these were not covered by existing tests; I had to add some new tests as well....

probably recognizing not(rangeFilter()) could be also beneficial; but I've left that out for now

@gianm
Copy link
Contributor

gianm commented Feb 6, 2024

LGTM, thanks!

@gianm gianm merged commit 392d585 into apache:master Feb 6, 2024
83 checks passed
@adarshsanjeev adarshsanjeev added this to the 30.0.0 milestone May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants