-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Two-stage filtering #3018
Two-stage filtering #3018
Conversation
{ | ||
ImmutableBitmap roaringBitmap = bitmapIndex; | ||
if (!(bitmapIndex instanceof WrappedImmutableRoaringBitmap)) { | ||
final BitmapFactory factory = RoaringBitmapSerdeFactory.bitmapFactory; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems pretty expensive if the bitmap isn't roaring
are we planning on making roaring the default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not planning on changing the default bitmap to roaring with this PR at least.
if anyone knows of a faster way to get a reverse bitmap iterator, please let me know
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's a better way for concise.
Also this code is just moved from lower down in the same file, it was added as part of the descending select patch. It should only get used for descending queries.
@@ -327,7 +394,34 @@ public Cursor apply(final Long input) | |||
{ | |||
private final Offset initOffset = offset.clone(); | |||
private final DateTime myBucket = gran.toDateTime(input); | |||
private Offset cursorOffset = offset; | |||
private CursorOffsetHolder cursorOffsetHolder = new CursorOffsetHolder(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added final
return current; | ||
} | ||
|
||
private static Filter flatten(Filter root) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this not from Hive?
public boolean supportsBitmapIndex(BitmapIndexSelector selector) | ||
{ | ||
for (Filter filter : filters) { | ||
if(!filter.supportsBitmapIndex(selector)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
formatting
49823ad
to
df2a1bd
Compare
if(row.size() == 0) { | ||
return matchNull; | ||
} | ||
for (int rowVal : row) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likely will be better to use get
to avoid boxing and unboxing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed for loop to use get
Would be great to see the ability to filter on long columns (including __time, both as a long and with time-based extractionFn) as a follow up. It seems like that should be doable, and useful, with existing filters like "selector" and "bound". use case e.g. https://groups.google.com/d/msg/druid-user/9rJBeMT0SeQ/G_hBwzwbAAAJ |
@nishantmonu51 are you able to take another look? |
@gianm: sure, I ll check out the changes. |
@@ -115,7 +115,56 @@ public BitmapFactory getBitmapFactory() | |||
public BitmapIndex getBitmapIndex(String dimension) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably also need to modify getBitmapIndex(String dimension, String value) below in this class so as to return null when column does not have bitmap index ?
would also be nice to have the same logic at one place and use it in both the methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I changed getBitmapIndex(String dimension, String value)
to return null for that case.
re: reusing logic, I'd prefer to leave this as-is, I assume you mean this block:
if (Strings.isNullOrEmpty(value)) {
return bitmapFactory.complement(bitmapFactory.makeEmptyImmutableBitmap(), getNumRows());
} else {
return bitmapFactory.makeEmptyImmutableBitmap();
}
I could have public ImmutableBitmap getBitmap(int idx)
in the anonymous BitmapIndex call getBitmapIndex(String dimension, String value)
, but I would still have to check for idx == 0, and would need to pass in some bogus non-null/empty value for the idx != 0 case.
I think it looks cleaner as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool
👍 after #3018 (comment) gets addressed. |
Implements the proposal in #2742
This PR adds support for filtering without bitmap indexes, by adding a new filtering step during Cursor iteration in the QueryableIndexStorageAdapter.
The linked proposal contains details about the approach taken, with the exception that filter match result caching is not implemented (can be a later PR).
Benchmarks, using the benchmarks in #2875
Master:
Patch:
current results for reading 1,500,000 rows from a long column, no filters:
Benchmark that applies a single selector filter to a 750,000 row segment:
Master
Patch
Benchmark only applicable to this patch: