-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix filtering bug in filtering unnest cols and dim cols: Received a non-applicable rewrite #14587
fix filtering bug in filtering unnest cols and dim cols: Received a non-applicable rewrite #14587
Conversation
processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java
Show resolved
Hide resolved
a3dc3ac
to
89aa0b5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thanks for the PR. The recursive one is a nice touch to handle the complex filters
209c889
to
93afd47
Compare
final OrFilter preOrFilter = new OrFilter(preFilterList); | ||
filterSplitter.addPreFilter(preOrFilter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we always make an OrFilter
if we are only checking for BooleanFilter
to break down? Isn't this going to turn AndFilter
into OrFilter
? Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are previously this code was dealign with only OR filters, we need to change the final filter based on type. I Need refactor the code to create new PreFilterList.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh shoot, I missed that. Good catch. We are extracting the filters and adding each to the pre-filter list. Maybe at each level of recursion, we need to recreate the And/Or filter and add it back to the pre-filter list. In such a case the number of filters in pre-filter list will be equal to the original filter size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@somu-imply @clintropolis Pushed a change to address the comment. Can you please review?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be an issue with the test test_pushdown_or_filters_unnested_and_original_dimension_with_unnest_adapters
as the expected results have changed. I'm going through the code today
35bee11
to
42c0a61
Compare
)) | ||
)); | ||
|
||
final OrFilter expectedPushDownFilter = new OrFilter(ImmutableList.of( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a similar test case on AND of OR filters as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding more test case
Almost there. Suggesting 1 more change: There are 2 cases:
Here since d3 can be mapped directly to an input column dim3, the filter can be reconstructed as a pre-filter with dim3 IN (a,b) or m1<10 and pushed to the base cursor.
since the filter outside contains an OR filter and
|
42c0a61
to
7d9518e
Compare
bound("dimZipf", "27", "27", false, false, null, StringComparators.NUMERIC), | ||
new SelectorDimFilter("j0.unnest", "Baz", null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since #14542 has been merged, this should use range
and equality
instead of bound
and making a SelectorDimFilter
directly. same for other tests added in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
3b27907
to
dd1f92d
Compare
processing/src/test/java/org/apache/druid/segment/UnnestStorageAdapterTest.java
Fixed
Show fixed
Hide fixed
26c4fb1
to
77b62a6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating the code and adding those tests. I have some comments so far, mostly cosmetic.
processing/src/main/java/org/apache/druid/segment/filter/Filters.java
Outdated
Show resolved
Hide resolved
return new SelectorDimFilter(dimension, value, extractionFn).toFilter(); | ||
} | ||
|
||
public static DimFilter sdfd(String dimension, String value, ExtractionFn extractionFn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. maybe some better names...but am good with these too
* 3. Filters on unnest column which is derived from Array or any other Expression can not be pushe down to base. | ||
* for example: a=1 and vc=3 , lets say vc is ExpressionVirtualColumn, and vc=3 can not be push down to base even if top level is AND filter. | ||
* 4. | ||
* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments, for future developers can we make it a bit clear with why in case 1, the entire filter (b=2 OR c=2) cannot be pushed down to the preFilter. Probably we can add an example saying that either b2 or c2 references a column that is not a part of the input for the case of unnest.
Something like, in unnest we come across 2 cases:
- unnesting a single dimension e.g. select * from foo, UNNEST(MV_TO_ARRAY(dim3)) as u(d3)
- Unnesting an expression from multiple columns e.g. select * from foo, UNNEST(ARRAY[dim1,dim2]) as u(d12)
In case 1, d3 is a direct reference to dim3 so any filter using d3 can be rewritten using dim3 and added to pre filter while in case2, due to presence of the expression virtual column expressionVirtualColumn("j0.unnest", "array(\"dim1\",\"dim2\")", ColumnType.STRING_ARRAY)
the filters on d12 cannot be pushed to the pre filters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated the comments
77b62a6
to
ccabe2b
Compare
ccabe2b
to
7210ae7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice to add some tests with deeper nesting to UnnestStorageAdapterTest
just to flex the recursive stuff a bit more
NullHandling.sqlCompatible() | ||
? equality("dimZipf", "27", ColumnType.LONG) | ||
: bound("dimZipf", "27", "27", false, false, null, StringComparators.NUMERIC), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use numericEquality
instead of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -44,4 +47,19 @@ public static SelectorFilter selector(final String fieldName, final String value | |||
{ | |||
return new SelectorFilter(fieldName, value, null); | |||
} | |||
|
|||
public static Filter sdf(String dimension, String value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does this need to exist instead of just using selector
? I don't see any callers using the version that takes extractionFn, and only one caller using sdfd
which seems overkill..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed static functions that I added
|
||
public FilterSplitter( | ||
String inputColumn, | ||
ColumnCapabilities inputColumnCapabilites, VirtualColumns queryVirtualColumns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit formatting, VirtualColumns
should be on a separate line (the style checker has trouble enforcing this sometimes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
c3e0a33
to
0508ec4
Compare
preFilterList.add(filter); | ||
// requires check for OR and And filters, disqualify rewrite for non-unnest filters | ||
if (queryFilter instanceof BooleanFilter) { | ||
int originalFilterCount = Filters.countNumberOfFilters(queryFilter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems like it would be a lot more efficient to do both the before and after count as part of the traversal that the filter splitter does, any reason I'm missing not to do that? Otherwise it seems like we have to basically traverse the filter tree 3 times per segment, before to get a count, then to try to rewrite, then after to get the new count. It seems like this could all just be baked into the FilterSplitter
itself. Apologies for not spotting this earlier on in review... but if we do this then i think we no longer need Filters.countNumberOfFilters
@@ -422,4 +423,19 @@ private static LinkedHashSet<Filter> flattenOrChildren(final Collection<Filter> | |||
|
|||
return retVal; | |||
} | |||
|
|||
public static int countNumberOfFilters(Filter filter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if this method needs to stay (re if other comment suggesting pushing counting inside of filter splitter does not work out for some reason) then this argument should be marked @Nullable
)); | ||
testComputeBaseAndPostUnnestFilters( | ||
testQueryFilter, | ||
"unnested-multi-string1 = 3", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't seem right, shouldn't it be pushing down multi-string1 = 3
since unnested-multi-string1
doesn't exist on the base adapter? I think the problem is that testComputeBaseAndPostUnnestFilters
is not getting the virtual column capabilities correctly, which makes it use the default capabilities to define it as float typed rather than string typed, so it isn't rewritten correctly in rewriteFilterOnUnnestColumnIfPossible
because it is not string typed.
If i change the line to get the capabilities from the unnest storage adapter
vc.capabilities(UNNEST_STORAGE_ADAPTER, inputColumn)
, the test fails because the pushed down filter is (multi-string1 = 3 && multi-string1 = 2)
which seems like the correct answer? Also makes most of the other tests fail expectations. I think we should probably add a check to ensure that filters on the unnest column never get pushed down, since I think that would effectively make the results null since the filter would match nothing since the column doesn't exist (unless it was filtering for nulls i guess)
null, | ||
VirtualColumns.EMPTY, | ||
inputColumn, | ||
vc.capabilities(inputColumn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be
vc.capabilities(inputColumn) | |
vc.capabilities(UNNEST_STORAGE_ADAPTER, inputColumn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
)); | ||
testComputeBaseAndPostUnnestFilters( | ||
testQueryFilter, | ||
"multi-string1 = 3", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it that only the filter on the unnest column pushed down instead of the whole and? same question for other test cases involving an AND? I think I would expect that in the case of AND especially, everything which can be pushed down is pushed down. This is what the comments in computeBaseAndPostUnnestFilters
indicates should happen..
In the case of an AND filter, I think that everything can be pushed down except for the case where the unnest output column is not a simple direct access to a MVD. Since it is an AND though, I think all of the other clauses could still be pushed down in that case, leaving only the unnest filter which could not be rewritten as a post filter.
But in the case of OR filters, I think its either all or nothing can be pushed down. So if there is a filter on the unnest output column where the virtual column is not a simple direct column access to a multi-value string, then nothing can be pushed down, else if the unnest column filters can be rewritten because they are simple mvd direct access, everything can be pushed down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this test, it is case of And filters, and We only push down the filters to base where rewriteFilterOnUnnestColumnIfPossible is successful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed the bug in filter counts and now tests looks good, Can you please verify again?
filterSplitter.addPreFilter(newFilter != null ? newFilter : filter); | ||
filterSplitter.addToPreFilterCount(1); | ||
} | ||
filterSplitter.addToOriginalFilterCount(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the filter getting added to the preFilter list twice or counted twice with addToPreFilterCount ? It already got added to preFilterList once in line 503. This would cause the count to go up and when you make the comparison of count on line 330, it would give the wrong count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
b4b0ec1
to
a0a8324
Compare
a0a8324
to
96a8208
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, two optimizations that need to be done are:
- Make sure not filters on unnested column are not pushed to base
- Update post filters for and filters to include only the subset not pushed to base
These can be done in a followup PR to not delay this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
per comment (and agreeing with @somu-imply) I think there is still some follow-up work to be done. Also it would be nice to have a few more tests where it can't push anything down, such as when filter is on unnest column that is wrapped in some expression like array_slice which would make the virtual column more than a direct access and so shouldn't be eligible for pushdown.
testComputeBaseAndPostUnnestFilters( | ||
testQueryFilter, | ||
"(multi-string1 = 3 && multi-string1 = 2)", | ||
"(unnested-multi-string1 = 3 && multi-string1 = 2)" | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is still some work to do here, but is ok to do as a follow-up.
Side note, this would be a very strange query since there is like coincidentally a filter on the underlying unnest column that is referring to it directly alongside a filter on the unnested column itself. I think it would make sense to migrate a lot of these tests away from having filters on both the unnest column and its underlying column, since I cant imagine its going to be common in practice.
But thinking about the more general case, like if it was WHERE unnested-multi-string1 = 3 && other-column = 2
i the best rewrite would be
testQueryFilter,
"(multi-string1 = 3 && other-column= 2)",
"unnested-multi-string1 = 3"
);```
but the current code leaves the `other-column = 2` in the post-filters, which isn't necessary. It would be a pretty big optimization to leave these off and only have to do post-filters when absolutely necessary. OR filters cannot have the same optimization.
But, i think it should be ok to do this in a follow-up PR, since there is some other correctness stuff to be done as well, such as skipping pushdown of NOT filters, which could incorrectly filter out too much before the post-filter has a chance to match stuff.
Fixes #XXXX.
Description
This PR fixes the bug in the following query, currently it throws the exception
org.apache.druid.query.QueryException: Received a non-applicable rewrite: {j0.unnest=c2}, filter's dimension: c1
Whenever we have And / OR filters on unnest columns, we rewrite the filters on unnest cols.
We were covering OR filters and disqualifying the rewrite for non unnest cols and we have to to do same thing with AND filters as well. This PR checks for Boolean filter and disqualify the rewrite for non unnest cols.
Adding the 3 tests that covers the given scenario.
3rd test case cover the nested filer scenario for unnest filters, This PR address the unpack the nested OR/AND filters and attempt to rewrite only for qualified filters.
Thank you
Fixed the bug ...
Renamed the class ...
Added a forbidden-apis entry ...
Release note
Key changed/added classes in this PR
MyFoo
OurBar
TheirBaz
This PR has: