BoundFilter optimizations, and related interface changes. #2727

gianm · 2016-03-24T22:52:56Z

BoundFilter:

For lexicographic bounds, use bitmapIndex.getIndex to find the start and end points,
then union all bitmaps between those points.
For alphanumeric bounds, iterate through dimValues, and union all bitmaps for values
matching the predicate.
Change behavior for nulls: it used to be that the BoundFilter would never match nulls,
now it matches nulls if "" is allowed by the lower limit and not excluded by the
upper limit.

Interface changes:

BitmapIndex: add int getIndex(value) to make it possible to get the index for a
value without retrieving the bitmap.
BitmapIndex: remove ImmutableBitmap getBitmap(value), change callers to getBitmap(getIndex(value)).
BitmapIndexSelector: allow retrieving the underlying BitmapIndex through getBitmapIndex.
Clarified contract of indexOf in Indexed, GenericIndexed.

Also added tests for SelectorFilter, NotFilter, and BoundFilter.

gianm · 2016-03-24T22:53:47Z

benchmarks:

roaring - new code
Benchmark                                          (cardinality)  Mode  Cnt       Score       Error  Units
BoundFilterBenchmark.matchEverythingAlphaNumeric            1000  avgt   10     922.892 ±    49.307  us/op
BoundFilterBenchmark.matchEverythingAlphaNumeric          100000  avgt   10   55084.688 ±  2612.769  us/op
BoundFilterBenchmark.matchEverythingAlphaNumeric         1000000  avgt   10  496876.650 ± 24082.952  us/op
BoundFilterBenchmark.matchEverythingLexicographic           1000  avgt   10     685.352 ±    21.630  us/op
BoundFilterBenchmark.matchEverythingLexicographic         100000  avgt   10   15020.188 ±   677.871  us/op
BoundFilterBenchmark.matchEverythingLexicographic        1000000  avgt   10  138014.970 ±  7010.974  us/op
BoundFilterBenchmark.matchHalfAlphaNumeric                  1000  avgt   10     419.717 ±    14.421  us/op
BoundFilterBenchmark.matchHalfAlphaNumeric                100000  avgt   10   44819.571 ±  2382.226  us/op
BoundFilterBenchmark.matchHalfAlphaNumeric               1000000  avgt   10  358544.263 ± 12146.582  us/op
BoundFilterBenchmark.matchHalfLexicographic                 1000  avgt   10     229.177 ±     9.528  us/op
BoundFilterBenchmark.matchHalfLexicographic               100000  avgt   10   25188.704 ±   626.630  us/op
BoundFilterBenchmark.matchHalfLexicographic              1000000  avgt   10   72892.182 ±  2968.422  us/op
BoundFilterBenchmark.matchNothingAlphaNumeric               1000  avgt   10     194.890 ±     7.406  us/op
BoundFilterBenchmark.matchNothingAlphaNumeric             100000  avgt   10   20405.402 ±   932.016  us/op
BoundFilterBenchmark.matchNothingAlphaNumeric            1000000  avgt   10  197780.159 ±  8140.001  us/op
BoundFilterBenchmark.matchNothingLexicographic              1000  avgt   10       1.611 ±     0.063  us/op
BoundFilterBenchmark.matchNothingLexicographic            100000  avgt   10       2.663 ±     0.156  us/op
BoundFilterBenchmark.matchNothingLexicographic           1000000  avgt   10       3.264 ±     0.172  us/op

roaring - old code
Benchmark                                          (cardinality)  Mode  Cnt        Score       Error  Units
BoundFilterBenchmark.matchEverythingAlphaNumeric            1000  avgt   10     1813.263 ±    51.723  us/op
BoundFilterBenchmark.matchEverythingAlphaNumeric          100000  avgt   10   203253.535 ±  7073.590  us/op
BoundFilterBenchmark.matchEverythingAlphaNumeric         1000000  avgt   10  2608738.649 ± 89007.817  us/op
BoundFilterBenchmark.matchEverythingLexicographic           1000  avgt   10     2010.454 ±    65.619  us/op
BoundFilterBenchmark.matchEverythingLexicographic         100000  avgt   10   231639.118 ± 12078.642  us/op
BoundFilterBenchmark.matchEverythingLexicographic        1000000  avgt   10  2608982.848 ± 41008.018  us/op
BoundFilterBenchmark.matchHalfAlphaNumeric                  1000  avgt   10      917.824 ±    45.221  us/op
BoundFilterBenchmark.matchHalfAlphaNumeric                100000  avgt   10   129332.609 ±  3458.142  us/op
BoundFilterBenchmark.matchHalfAlphaNumeric               1000000  avgt   10  1397767.744 ± 37115.753  us/op
BoundFilterBenchmark.matchHalfLexicographic                 1000  avgt   10     1009.587 ±    37.842  us/op
BoundFilterBenchmark.matchHalfLexicographic               100000  avgt   10   129598.064 ±  6365.075  us/op
BoundFilterBenchmark.matchHalfLexicographic              1000000  avgt   10  1502001.883 ± 59413.536  us/op
BoundFilterBenchmark.matchNothingAlphaNumeric               1000  avgt   10      202.195 ±     9.855  us/op
BoundFilterBenchmark.matchNothingAlphaNumeric             100000  avgt   10    20334.395 ±   694.945  us/op
BoundFilterBenchmark.matchNothingAlphaNumeric            1000000  avgt   10   200217.915 ±  3852.198  us/op
BoundFilterBenchmark.matchNothingLexicographic              1000  avgt   10      289.969 ±     5.781  us/op
BoundFilterBenchmark.matchNothingLexicographic            100000  avgt   10    28739.014 ±  1620.569  us/op
BoundFilterBenchmark.matchNothingLexicographic           1000000  avgt   10   288558.449 ±  8818.323  us/op

concise - new code
Benchmark                                          (cardinality)  Mode  Cnt        Score        Error  Units
BoundFilterBenchmark.matchEverythingAlphaNumeric            1000  avgt   10      825.754 ±     34.183  us/op
BoundFilterBenchmark.matchEverythingAlphaNumeric          100000  avgt   10   162995.602 ±   8164.037  us/op
BoundFilterBenchmark.matchEverythingAlphaNumeric         1000000  avgt   10  1730761.586 ±  65781.270  us/op
BoundFilterBenchmark.matchEverythingLexicographic           1000  avgt   10      706.111 ±     31.862  us/op
BoundFilterBenchmark.matchEverythingLexicographic         100000  avgt   10   146757.525 ±  10171.468  us/op
BoundFilterBenchmark.matchEverythingLexicographic        1000000  avgt   10  1501751.336 ± 175176.476  us/op
BoundFilterBenchmark.matchHalfLexicographic                 1000  avgt   10      341.310 ±     25.054  us/op
BoundFilterBenchmark.matchHalfLexicographic               100000  avgt   10    45627.867 ±   2141.370  us/op
BoundFilterBenchmark.matchHalfLexicographic              1000000  avgt   10   500524.691 ±  39671.188  us/op
BoundFilterBenchmark.matchNothingAlphaNumeric               1000  avgt   10      127.806 ±      6.626  us/op
BoundFilterBenchmark.matchNothingAlphaNumeric             100000  avgt   10    13703.036 ±    758.373  us/op
BoundFilterBenchmark.matchNothingAlphaNumeric            1000000  avgt   10   122963.784 ±   6927.395  us/op
BoundFilterBenchmark.matchNothingLexicographic              1000  avgt   10        1.646 ±      0.080  us/op
BoundFilterBenchmark.matchNothingLexicographic            100000  avgt   10        2.673 ±      0.179  us/op
BoundFilterBenchmark.matchNothingLexicographic           1000000  avgt   10        3.268 ±      0.147  us/op

concise - old code
Benchmark                                          (cardinality)  Mode  Cnt        Score        Error  Units
BoundFilterBenchmark.matchEverythingAlphaNumeric            1000  avgt   10     1599.303 ±     45.809  us/op
BoundFilterBenchmark.matchEverythingAlphaNumeric          100000  avgt   10   345393.965 ±  14189.766  us/op
BoundFilterBenchmark.matchEverythingAlphaNumeric         1000000  avgt   10  3684987.846 ±  84406.901  us/op
BoundFilterBenchmark.matchEverythingLexicographic           1000  avgt   10     1976.324 ±     54.248  us/op
BoundFilterBenchmark.matchEverythingLexicographic         100000  avgt   10   357991.768 ±   7785.695  us/op
BoundFilterBenchmark.matchEverythingLexicographic        1000000  avgt   10  3582925.614 ± 164209.427  us/op
BoundFilterBenchmark.matchHalfLexicographic                 1000  avgt   10     1186.107 ±     74.529  us/op
BoundFilterBenchmark.matchHalfLexicographic               100000  avgt   10   167451.872 ±   9642.969  us/op
BoundFilterBenchmark.matchHalfLexicographic              1000000  avgt   10  1890634.368 ±  61274.830  us/op
BoundFilterBenchmark.matchNothingAlphaNumeric               1000  avgt   10      128.502 ±      8.668  us/op
BoundFilterBenchmark.matchNothingAlphaNumeric             100000  avgt   10    13224.496 ±    931.381  us/op
BoundFilterBenchmark.matchNothingAlphaNumeric            1000000  avgt   10   128046.164 ±   5413.171  us/op
BoundFilterBenchmark.matchNothingLexicographic              1000  avgt   10      286.863 ±     12.302  us/op
BoundFilterBenchmark.matchNothingLexicographic            100000  avgt   10    28887.794 ±   1434.066  us/op
BoundFilterBenchmark.matchNothingLexicographic           1000000  avgt   10   286646.487 ±  13587.081  us/op

fjy · 2016-03-25T00:00:27Z

@gianm should we change default to use roaring as it seems to be much better in most cases

fjy · 2016-03-25T00:06:36Z

processing/src/main/java/io/druid/segment/filter/BoundFilter.java

          }
        }
    );
  }
+
+  private boolean doesMatch(String input)
+  {


is this logic just copypasta of what was there before?

ah, no, it is not

mostly yes, except for the null handling.

fjy · 2016-03-25T15:07:36Z

👍

gianm · 2016-03-25T17:44:07Z

Some of the new tests depend on #2737 to work.

BoundFilter: - For lexicographic bounds, use bitmapIndex.getIndex to find the start and end points, then union all bitmaps between those points. - For alphanumeric bounds, iterate through dimValues, and union all bitmaps for values matching the predicate. - Change behavior for nulls: it used to be that the BoundFilter would never match nulls, now it matches nulls if "" is allowed by the lower limit and not excluded by the upper limit. Interface changes: - BitmapIndex: add `int getIndex(value)` to make it possible to get the index for a value without retrieving the bitmap. - BitmapIndex: remove `ImmutableBitmap getBitmap(value)`, change callers to `getBitmap(getIndex(value))`. - BitmapIndexSelector: allow retrieving the underlying BitmapIndex through getBitmapIndex. - Clarified contract of indexOf in Indexed, GenericIndexed. Also added tests for SelectorFilter, NotFilter, and BoundFilter.

jon-wei · 2016-03-25T22:06:11Z

processing/src/main/java/io/druid/segment/ColumnSelectorBitmapIndexSelector.java

@@ -111,6 +112,17 @@ public BitmapFactory getBitmapFactory()
  }

  @Override
+  public BitmapIndex getBitmapIndex(String dimension)


Should public BitmapIndex getBitmapIndex(String dimension) and public ImmutableBitmap getBitmapIndex(String dimension, String value) be named differently? Maybe slightly confusing to have functions with a different return type with the same name

I guess I didn't want to touch too many files in this patch (all of the other filters), but I also didn't want to use a different name for this method, since it's the one that really should be called getBitmapIndex.

I am indifferent though & happy to go with whatever people think is best

ok, I'm fine with this PR as-is, that rename if desired could be handled in later patch

jon-wei · 2016-03-25T22:54:12Z

👍 after travis

gianm added the Improvement label Mar 24, 2016

gianm added this to the 0.9.1 milestone Mar 24, 2016

gianm mentioned this pull request Mar 24, 2016

Druid filter extensions #2613

Closed

gianm force-pushed the optimize-bound-filter branch from 8207545 to a7e4b1b Compare March 24, 2016 23:04

fjy reviewed Mar 25, 2016
View reviewed changes

jon-wei mentioned this pull request Mar 25, 2016

Allow filters to use extraction functions #2690

Merged

gianm force-pushed the optimize-bound-filter branch from a7e4b1b to 77fed15 Compare March 25, 2016 17:23

gianm mentioned this pull request Mar 25, 2016

Fix predicate-based ValueMatcher behavior for IncrementalIndex on missing columns. #2737

Merged

gianm force-pushed the optimize-bound-filter branch from 77fed15 to 2970b49 Compare March 25, 2016 21:11

jon-wei reviewed Mar 25, 2016
View reviewed changes

fjy closed this Mar 26, 2016

fjy reopened this Mar 26, 2016

fjy merged commit 7fe277e into apache:master Mar 27, 2016

fjy mentioned this pull request May 20, 2016

[WIP] Druid 0.9.1 Release Notes #2999

Closed

gianm deleted the optimize-bound-filter branch August 10, 2016 05:14

gianm mentioned this pull request Aug 10, 2016

Restore optimizations in BoundFilter. #3343

Merged

snyk-bot mentioned this pull request Jan 13, 2022

[Snyk] Security upgrade axios from 0.19.0 to 0.20.0 Accedian/incubator-druid#738

Open

snyk-bot mentioned this pull request Feb 10, 2022

[Snyk] Security upgrade axios from 0.19.0 to 0.20.0 Accedian/incubator-druid#917

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BoundFilter optimizations, and related interface changes. #2727

BoundFilter optimizations, and related interface changes. #2727

gianm commented Mar 24, 2016

gianm commented Mar 24, 2016

fjy commented Mar 25, 2016

fjy Mar 25, 2016

fjy Mar 25, 2016

gianm Mar 25, 2016

fjy commented Mar 25, 2016

gianm commented Mar 25, 2016

jon-wei Mar 25, 2016

gianm Mar 25, 2016

jon-wei Mar 25, 2016

jon-wei commented Mar 25, 2016

BoundFilter optimizations, and related interface changes. #2727

BoundFilter optimizations, and related interface changes. #2727

Conversation

gianm commented Mar 24, 2016

gianm commented Mar 24, 2016

fjy commented Mar 25, 2016

fjy Mar 25, 2016

Choose a reason for hiding this comment

fjy Mar 25, 2016

Choose a reason for hiding this comment

gianm Mar 25, 2016

Choose a reason for hiding this comment

fjy commented Mar 25, 2016

gianm commented Mar 25, 2016

jon-wei Mar 25, 2016

Choose a reason for hiding this comment

gianm Mar 25, 2016

Choose a reason for hiding this comment

jon-wei Mar 25, 2016

Choose a reason for hiding this comment

jon-wei commented Mar 25, 2016