optimize numeric column null value checking for low filter selectivity (more rows) #8822

clintropolis · 2019-11-05T01:34:48Z

Description

I would like to add documentation for Druid SQL compatible null handling added in #4349 to the website, in part because it has been merged for quite a while now, but also so that we can feel better about the changes #8566 causes in SQL behavior in Druids native mode (tl;dr some calcite optimizations lead to some query incompatibilities specifically around our treatment of '' and null as equivalent).

Part of like.. being responsible and stuff, before adding this documentation I first wanted to collect some data to determine if it was a good idea to in fact document it at this time. The place I was specifically worried about was the isNull check, so I added a benchmark, NullHandlingBitmapGetVsIteratorBenchmark, and ran some experiments to see what to expect as well as if we could do better.

The nature of selectors lends itself to using a bitmap iterator as an alternative to calling bitmap.get for every isNull check, so I first tested get vs iterator on 500k row 'column's with various null bitmap densities and filter selectivities to simulate the overhead of having null value handling selector with each approach. I have so far only collected information on roaring, because with concise the get method take far too long to complete at low selectivity,

# Parameters: (bitmapType = concise, filterMatch = 0.99999, nullDensity = 0.5, numRows = 500000)

# Run progress: 0.00% complete, ETA 00:22:30
# Fork: 1 of 1
# Warmup Iteration   1: 641486825.960 us/op
...

Key changed/added classes in this PR

ImmutableBitmap, WrappedRoaringBitmap, WrappedConciseBitmap, WrappedImmutableRoaringBitmap, WrappedImmutableConciseBitmap,
ColumnarDoubles, ColumnarLongs, ColumnarFloats

…ad of bitmap.get for those sweet sweet nanoseconds

suneet-s · 2019-11-05T04:28:06Z

The heatmaps look super cool! (although I don't think I fully understand them yet :| ) What did you use to build them?

clintropolis · 2019-11-05T08:34:13Z

The heatmaps look super cool! (although I don't think I fully understand them yet :| ) What did you use to build them?

Hah, thanks, I used R with ggplot2 to make them. I'll try to clean up the code and attach it, if I have a chance, in case anyone else wants to do some benchmarking tinker with the results. As for what they mean, I'll try my best to explain it as succinctly as possible 😅.

The benchmark I added in this PR, NullHandlingBitmapGetVsIteratorBenchmark, is simulating approximately what happens during query processing on a historical for numerical null columns when used with something like a NullableAggregator, which is a wrapper around another Aggregator to ignore null values or delegate aggregation to the wrapped aggregator for rows that have actual values.

When SQL compatible null handling is enabled, numeric columns are stored with 2 parts if nulls are present: the column itself, and a bitmap that has a set bit for each null value. At query time, filters are evaluated to compute something called an Offset, which is basically just the set of rows that are taking part in the query, and are used to create a column value/vector selector for those rows from the underlying column. Selectors have a isNull method which can be used to determine if a particular row is a null, and for numeric columns this is checking if that row is set on the bitmap. So mechanically, NullableAggregator will check each row from the selector to see if it is null (through the underlying bitmap), if it is, ignore it, but if not, delegate to the underlying Aggregator to do whatever it does to compute the result.

The benchmark simplifies this concept into using a BitSet to simulate the Offset, an ImmutableBitmap for the null value bitmap, and a for loop that iterates over the "rows" selected by the BitSet to emulate the behavior of the aggregator on the selector, checking for set bits in the ImmutableBitmap for each index like isNull would be doing.

Translating this into heatmap, the y axis is showing the effects of differences in density of the null bitmap (bottom is a few null values, top is nearly all rows are null), the x axis is the differences in the number of rows that our selector will select (left side selects very few rows, right scans nearly all rows), and the z axis is the difference in benchmark operation time between using bitmap.get` and using an iterator (or peekable iterator) from the null bitmap to move along with the iterator on the selectivity bitset. Further, some of the heatmaps have translated the raw benchmark times into the time per row by scaling the time by how many rows are selected, to standardize measurement across the x axis, making it easier to compare the 2 strategies.

Sorry, that didn't end up being so short... I .. hope this didn't make it more confusing 😜

clintropolis · 2019-11-05T08:36:09Z

To follow-up on the PR description, I let the concise benchmarks finish where nearly all of the rows are selected:

Benchmark                                                  (bitmapType)  (filterMatch)  (nullDensity)  (numRows)  Mode  Cnt          Score   Error  Units
NullHandlingBitmapGetVsIteratorBenchmark.get                    concise        0.99999              0     500000  avgt    2       2438.318          us/op
NullHandlingBitmapGetVsIteratorBenchmark.get                    concise        0.99999            0.1     500000  avgt    2  261150067.426          us/op
NullHandlingBitmapGetVsIteratorBenchmark.get                    concise        0.99999           0.25     500000  avgt    2  430133586.339          us/op
NullHandlingBitmapGetVsIteratorBenchmark.get                    concise        0.99999            0.5     500000  avgt    2  623322588.940          us/op
NullHandlingBitmapGetVsIteratorBenchmark.get                    concise        0.99999           0.75     500000  avgt    2  449875260.568          us/op
NullHandlingBitmapGetVsIteratorBenchmark.get                    concise        0.99999           0.99     500000  avgt    2   29015358.505          us/op
NullHandlingBitmapGetVsIteratorBenchmark.iterator               concise        0.99999              0     500000  avgt    2       2316.307          us/op
NullHandlingBitmapGetVsIteratorBenchmark.iterator               concise        0.99999            0.1     500000  avgt    2       4158.871          us/op
NullHandlingBitmapGetVsIteratorBenchmark.iterator               concise        0.99999           0.25     500000  avgt    2       6041.184          us/op
NullHandlingBitmapGetVsIteratorBenchmark.iterator               concise        0.99999            0.5     500000  avgt    2       8397.707          us/op
NullHandlingBitmapGetVsIteratorBenchmark.iterator               concise        0.99999           0.75     500000  avgt    2       6129.792          us/op
NullHandlingBitmapGetVsIteratorBenchmark.iterator               concise        0.99999           0.99     500000  avgt    2       3511.961          us/op
NullHandlingBitmapGetVsIteratorBenchmark.peekableIterator       concise        0.99999              0     500000  avgt    2       2440.615          us/op
NullHandlingBitmapGetVsIteratorBenchmark.peekableIterator       concise        0.99999            0.1     500000  avgt    2       4431.640          us/op
NullHandlingBitmapGetVsIteratorBenchmark.peekableIterator       concise        0.99999           0.25     500000  avgt    2       6302.718          us/op
NullHandlingBitmapGetVsIteratorBenchmark.peekableIterator       concise        0.99999            0.5     500000  avgt    2       8911.671          us/op
NullHandlingBitmapGetVsIteratorBenchmark.peekableIterator       concise        0.99999           0.75     500000  avgt    2       6809.052          us/op
NullHandlingBitmapGetVsIteratorBenchmark.peekableIterator       concise        0.99999           0.99     500000  avgt    2       4995.936          us/op

Using get with concise is just terribad. The results for 1% of rows is similarly pretty awful, though a couple orders of magnitude less bad:

Benchmark                                                  (bitmapType)  (filterMatch)  (nullDensity)  (numRows)  Mode  Cnt        Score   Error  Units
NullHandlingBitmapGetVsIteratorBenchmark.get                    concise           0.01              0     500000  avgt    2      100.110          us/op
NullHandlingBitmapGetVsIteratorBenchmark.get                    concise           0.01            0.1     500000  avgt    2  3878501.478          us/op
NullHandlingBitmapGetVsIteratorBenchmark.get                    concise           0.01           0.25     500000  avgt    2  4853693.538          us/op
NullHandlingBitmapGetVsIteratorBenchmark.get                    concise           0.01            0.5     500000  avgt    2  8399206.128          us/op
NullHandlingBitmapGetVsIteratorBenchmark.get                    concise           0.01           0.75     500000  avgt    2  5169724.280          us/op
NullHandlingBitmapGetVsIteratorBenchmark.get                    concise           0.01           0.99     500000  avgt    2   440739.562          us/op
NullHandlingBitmapGetVsIteratorBenchmark.iterator               concise           0.01              0     500000  avgt    2       69.756          us/op
NullHandlingBitmapGetVsIteratorBenchmark.iterator               concise           0.01            0.1     500000  avgt    2     1813.610          us/op
NullHandlingBitmapGetVsIteratorBenchmark.iterator               concise           0.01           0.25     500000  avgt    2     3364.983          us/op
NullHandlingBitmapGetVsIteratorBenchmark.iterator               concise           0.01            0.5     500000  avgt    2     3745.194          us/op
NullHandlingBitmapGetVsIteratorBenchmark.iterator               concise           0.01           0.75     500000  avgt    2     2702.778          us/op
NullHandlingBitmapGetVsIteratorBenchmark.iterator               concise           0.01           0.99     500000  avgt    2     1874.390          us/op
NullHandlingBitmapGetVsIteratorBenchmark.peekableIterator       concise           0.01              0     500000  avgt    2       77.929          us/op
NullHandlingBitmapGetVsIteratorBenchmark.peekableIterator       concise           0.01            0.1     500000  avgt    2     1592.171          us/op
NullHandlingBitmapGetVsIteratorBenchmark.peekableIterator       concise           0.01           0.25     500000  avgt    2     2216.227          us/op
NullHandlingBitmapGetVsIteratorBenchmark.peekableIterator       concise           0.01            0.5     500000  avgt    2     3535.110          us/op
NullHandlingBitmapGetVsIteratorBenchmark.peekableIterator       concise           0.01           0.75     500000  avgt    2     2400.203          us/op
NullHandlingBitmapGetVsIteratorBenchmark.peekableIterator       concise           0.01           0.99     500000  avgt    2      436.322          us/op

I did not plot these because it seemed to be rather pointless, concise seems to be only viable when using IntIterator or PeekableIntIterator.

jnaous · 2019-11-05T15:12:33Z

Thanks Clint! These /are/ really good benchmarks. I'm curious what percentage of the entire cost for processing a row a null check would be so we can have a good idea of what % overhead we're talking about.

…

On Tue, Nov 5, 2019 at 12:36 AM Clint Wylie ***@***.***> wrote: To follow-up on the PR description, I let the concise benchmarks finish where nearly all of the rows are selected: Benchmark (bitmapType) (filterMatch) (nullDensity) (numRows) Mode Cnt Score Error Units NullHandlingBitmapGetVsIteratorBenchmark.get concise 0.99999 0 500000 avgt 2 2438.318 us/op NullHandlingBitmapGetVsIteratorBenchmark.get concise 0.99999 0.1 500000 avgt 2 261150067.426 us/op NullHandlingBitmapGetVsIteratorBenchmark.get concise 0.99999 0.25 500000 avgt 2 430133586.339 us/op NullHandlingBitmapGetVsIteratorBenchmark.get concise 0.99999 0.5 500000 avgt 2 623322588.940 us/op NullHandlingBitmapGetVsIteratorBenchmark.get concise 0.99999 0.75 500000 avgt 2 449875260.568 us/op NullHandlingBitmapGetVsIteratorBenchmark.get concise 0.99999 0.99 500000 avgt 2 29015358.505 us/op NullHandlingBitmapGetVsIteratorBenchmark.iterator concise 0.99999 0 500000 avgt 2 2316.307 us/op NullHandlingBitmapGetVsIteratorBenchmark.iterator concise 0.99999 0.1 500000 avgt 2 4158.871 us/op NullHandlingBitmapGetVsIteratorBenchmark.iterator concise 0.99999 0.25 500000 avgt 2 6041.184 us/op NullHandlingBitmapGetVsIteratorBenchmark.iterator concise 0.99999 0.5 500000 avgt 2 8397.707 us/op NullHandlingBitmapGetVsIteratorBenchmark.iterator concise 0.99999 0.75 500000 avgt 2 6129.792 us/op NullHandlingBitmapGetVsIteratorBenchmark.iterator concise 0.99999 0.99 500000 avgt 2 3511.961 us/op NullHandlingBitmapGetVsIteratorBenchmark.peekableIterator concise 0.99999 0 500000 avgt 2 2440.615 us/op NullHandlingBitmapGetVsIteratorBenchmark.peekableIterator concise 0.99999 0.1 500000 avgt 2 4431.640 us/op NullHandlingBitmapGetVsIteratorBenchmark.peekableIterator concise 0.99999 0.25 500000 avgt 2 6302.718 us/op NullHandlingBitmapGetVsIteratorBenchmark.peekableIterator concise 0.99999 0.5 500000 avgt 2 8911.671 us/op NullHandlingBitmapGetVsIteratorBenchmark.peekableIterator concise 0.99999 0.75 500000 avgt 2 6809.052 us/op NullHandlingBitmapGetVsIteratorBenchmark.peekableIterator concise 0.99999 0.99 500000 avgt 2 4995.936 us/op Using get with concise is just terribad. The results for 1% of rows is similarly pretty awful, though a couple orders of magnitude less bad: Benchmark (bitmapType) (filterMatch) (nullDensity) (numRows) Mode Cnt Score Error Units NullHandlingBitmapGetVsIteratorBenchmark.get concise 0.01 0 500000 avgt 2 100.110 us/op NullHandlingBitmapGetVsIteratorBenchmark.get concise 0.01 0.1 500000 avgt 2 3878501.478 us/op NullHandlingBitmapGetVsIteratorBenchmark.get concise 0.01 0.25 500000 avgt 2 4853693.538 us/op NullHandlingBitmapGetVsIteratorBenchmark.get concise 0.01 0.5 500000 avgt 2 8399206.128 us/op NullHandlingBitmapGetVsIteratorBenchmark.get concise 0.01 0.75 500000 avgt 2 5169724.280 us/op NullHandlingBitmapGetVsIteratorBenchmark.get concise 0.01 0.99 500000 avgt 2 440739.562 us/op NullHandlingBitmapGetVsIteratorBenchmark.iterator concise 0.01 0 500000 avgt 2 69.756 us/op NullHandlingBitmapGetVsIteratorBenchmark.iterator concise 0.01 0.1 500000 avgt 2 1813.610 us/op NullHandlingBitmapGetVsIteratorBenchmark.iterator concise 0.01 0.25 500000 avgt 2 3364.983 us/op NullHandlingBitmapGetVsIteratorBenchmark.iterator concise 0.01 0.5 500000 avgt 2 3745.194 us/op NullHandlingBitmapGetVsIteratorBenchmark.iterator concise 0.01 0.75 500000 avgt 2 2702.778 us/op NullHandlingBitmapGetVsIteratorBenchmark.iterator concise 0.01 0.99 500000 avgt 2 1874.390 us/op NullHandlingBitmapGetVsIteratorBenchmark.peekableIterator concise 0.01 0 500000 avgt 2 77.929 us/op NullHandlingBitmapGetVsIteratorBenchmark.peekableIterator concise 0.01 0.1 500000 avgt 2 1592.171 us/op NullHandlingBitmapGetVsIteratorBenchmark.peekableIterator concise 0.01 0.25 500000 avgt 2 2216.227 us/op NullHandlingBitmapGetVsIteratorBenchmark.peekableIterator concise 0.01 0.5 500000 avgt 2 3535.110 us/op NullHandlingBitmapGetVsIteratorBenchmark.peekableIterator concise 0.01 0.75 500000 avgt 2 2400.203 us/op NullHandlingBitmapGetVsIteratorBenchmark.peekableIterator concise 0.01 0.99 500000 avgt 2 436.322 us/op I did not plot these because it seemed to be rather pointless, concise seems to be *only* viable when using IntIterator or PeekableIntIterator. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#8822>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAPSYCV5IXHLEI6HE5RKM3DQSEWAJANCNFSM4JI4LCNA> .

-- Jad Naous Imply | VP R&D 650-521-3425 jad.naous@imply.io

richardstartin · 2019-11-05T19:38:47Z

Cool heatmaps! Iteration should definitely perform better than calls to contains because it reduces the complexity of each call from logarithmic in the number of non empty 16 bit blocks to constant, obviously with a small set up cost. This is the same principle as #6764 which removes binary searches during bitmap construction.

It looks like the operation in question (VectorSelectorUtils.populateNullVector) is actually simulating the extraction of a mask from the bitmap at the offset of the current vector. Perhaps, with the ability to skip to the next non empty vector (given a vector width), it would be quite easy to implement this as an iterator which returns masks on each call to next. Perhaps you could plug this in to Druid's vectorized query engine. cc @lemire.

clintropolis · 2019-11-05T20:02:00Z

I'm curious what
percentage of the entire cost for processing a row a null check would be so
we can have a good idea of what % overhead we're talking about.

The last 3 animations show the estimated per row cost in nanoseconds for each of the 3 strategies. I will summarize:

get - most of the numbers look to be in the 10-25ns per row range (higher at low selectivity where it matters most)
IntIterator - about half are under 10ns per row (at low selectivity), this is definitely the best, but at super high selectivities (.1% of rows selected) with very dense bitmaps it climbs to over a couple of microseconds per row
PeekableIntIterator - about half are between 10-15ns per row (at low selectivity), most below 25ns, but also has more overhead with dense bitmaps at very high selectivity but only climbs to about 50-60ns per row in the worst case.

To me it kind of seems like a toss up to me which is better between using the PeekableIntIterator and the plain IntIterator, it almost seems worth it to eat the slow per row times at high selectivity in exchange for that 5ns per row at low selectivity, but both approaches fair better at low selectivity than using get, so I went more conservative and used the PeekableIntIterator for now.

clintropolis · 2019-11-05T20:07:05Z

It looks like the operation in question (VectorSelectorUtils.populateNullVector) is actually simulating the extraction of a mask from the bitmap at the offset of the current vector. Perhaps, with the ability to skip to the next non empty vector (given a vector width), it would be quite easy to implement this as an iterator which returns masks on each call to next.

That sounds nice. I was planning on digging into this for future work to see if I could improve the null vector construction, but even just using the iterator should be an improvement over the existing code, so I didn't investigate that as part of this PR.

jnaous · 2019-11-05T21:29:11Z

Sorry my question wasn't clear. I meant in terms of the cost of the full operation on a row. For example, if we're doing a longSum operation on a column, what would the cost per row be, and what would the percentage overhead of this check be?

…

On Tue, Nov 5, 2019 at 12:02 PM Clint Wylie ***@***.***> wrote: I'm curious what percentage of the entire cost for processing a row a null check would be so we can have a good idea of what % overhead we're talking about. The last 3 animations show the estimated per row cost in nanoseconds for each of the 3 strategies. I will summarize: - get - most of the numbers look to be in the 10-25ns per row range (higher at low selectivity where it matters most) - IntIterator - about half are under 10ns per row (at low selectivity), this is definitely the best, but at super high selectivities (.1% of rows selected) with very dense bitmaps it climbs to over a couple of microseconds per row - PeekableIntIterator - about half are between 10-15ns per row (at low selectivity), most below 25ns, but also has more overhead with dense bitmaps at very high selectivity but only climbs to about 50-60ns per row in the worst case. To me it kind of seems like a toss up to me which is better between using the PeekableIntIterator and the plain IntIterator, it almost seems worth it to eat the slow per row times at high selectivity in exchange for that 5ns per row at low selectivity, but both approaches fair better at low selectivity than using get, so I went more conservative and used the PeekableIntIterator for now. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#8822>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAPSYCSJAKFZOTCZLBN7VPDQSHGMHANCNFSM4JI4LCNA> .

-- Jad Naous Imply | VP R&D 650-521-3425 jad.naous@imply.io

lemire · 2019-11-05T21:31:14Z

The plots are great indeed. Bravo!

clintropolis · 2019-11-05T21:31:43Z

For example, if we're doing a longSum operation on a
column, what would the cost per row be, and what would the percentage
overhead of this check be?

The benchmark added in this PR can't determine that, all I can provide is the fixed cost per row of having nulls. That seems useful to know, but also time intensive so I might not get to it as part of this PR.

…able-iterator

clintropolis · 2019-11-06T20:31:12Z

I slightly adjusted the code to try to cut out a few instructions per loop, re-running benchmarks currently, and this time on an m5.large instead of my laptop, will add a comment and update the PR description once they are finished.

gianm · 2019-11-06T20:36:25Z

processing/src/main/java/org/apache/druid/collections/bitmap/PeekableIteratorAdapter.java

+public class PeekableIteratorAdapter<TIntIterator extends IntIterator> implements PeekableIntIterator
+{
+  final TIntIterator baseIterator;
+  Integer mark;


This might perform better if you make mark an int and then use -1 to signify not being set.

gianm · 2019-11-06T20:42:09Z

processing/src/main/java/org/apache/druid/segment/data/ColumnarDoubles.java

        @Override
        public boolean isNull()
        {
-          return nullValueBitmap.get(offset.getOffset());
+          final int i = offset.getOffset();
+          if (nullMark < i) {


If someone calls offset.reset() (topN does this sometimes) then this will be wrong. I think you need some logic to detect the offset going backwards, and resetting the iterator in that case.

This is a good point, I forgot about reset, will fix (hopefully without adding too many extra instructions since this is called in a hot loop 😢)

gianm · 2019-11-06T20:43:07Z

processing/src/main/java/org/apache/druid/segment/vector/VectorSelectorUtils.java

@@ -47,14 +47,38 @@
      retVal = new boolean[offset.getMaxVectorSize()];
    }

-    // Probably not super efficient to call "get" so much, but, no worse than the non-vectorized version.


Take that, comment!!

gianm · 2019-11-06T20:43:57Z

processing/src/main/java/org/apache/druid/segment/vector/VectorSelectorUtils.java

    if (offset.isContiguous()) {
+      final int startOffset = offset.getStartOffset();


I think this also needs some logic to detect the offset getting reset.

gianm · 2019-11-06T20:46:06Z

processing/src/main/java/org/apache/druid/segment/vector/VectorSelectorUtils.java

    if (offset.isContiguous()) {
+      final int startOffset = offset.getStartOffset();
+      nullIterator.advanceIfNeeded(startOffset);


Is it correct to call advanceIfNeeded immediately after calling next, and then not update nextNullRow?

No, this is a mistake, I will fix (and make sure there is test coverage because this looks wrong).

gianm · 2019-11-06T20:47:54Z

processing/src/main/java/org/apache/druid/segment/vector/VectorSelectorUtils.java

      }
    } else {
+      final int[] currentOffsets = offset.getOffsets();
+      nullIterator.advanceIfNeeded(currentOffsets[0]);


Same question.

gianm · 2019-11-06T20:59:05Z

processing/src/main/java/org/apache/druid/segment/vector/VectorSelectorUtils.java

      for (int i = 0; i < offset.getCurrentVectorSize(); i++) {
-        retVal[i] = nullValueBitmap.get(offset.getOffsets()[i]);
+        final int row = currentOffsets[i];
+        if (row == nextNullRow) {


What if the nulls are [1, 2, 3] and the offsets are [1, 3, 4]? I think at some point nextNullRow will be 2, and row will be 3, which will cause a false to get written into retVal, which is wrong.

You were correct on this, I have fixed the implementation to only use advanceIfNeeded and peekNext which I think makes it a bit easier to follow, though maybe not quite as optimal since it might result in a few additional advanceIfNeeded operations. I also added tests for this method that catch this, since it wasn't actually covered otherwise previously.

jihoonson

Cool, thank you for the detailed benchmark results.

jihoonson · 2019-11-06T20:06:36Z

processing/src/main/java/org/apache/druid/collections/bitmap/PeekableIteratorAdapter.java

+
+  @Override
+  public int next()
+  {


nit: even though it's missing in the javadoc of IntIterator, it would be a good convention to throw NoSuchElementException if hasNext() returns false.

looking into the implementations of IntIterator this is wrapping, it looks like they will throw this exception, so I don't need to have the check here. I also removed the check from peekNext for the same reason.

jihoonson · 2019-11-06T20:08:23Z

processing/src/main/java/org/apache/druid/collections/bitmap/PeekableIteratorAdapter.java

+import org.roaringbitmap.IntIterator;
+import org.roaringbitmap.PeekableIntIterator;
+
+public class PeekableIteratorAdapter<TIntIterator extends IntIterator> implements PeekableIntIterator


Would you please add javadoc?

…able-iterator

clintropolis · 2019-11-08T09:35:17Z

I repeated the benchmarks after being modified to have the latest changes to make sure everything was still good, this time on an AWS m5.large, and they appear to be approximately the same with the peekable iterator faring ever so slightly better than last time around:

peekable iterator better than get:

get better than peekable iterator:

get-vs-iterator-vs-peekable-redux.csv.zip

gianm · 2019-11-08T15:43:07Z

...essing/src/main/java/org/apache/druid/collections/bitmap/ConcisePeekableIteratorAdapter.java

+  {
+    if (mark < i) {
+      baseIterator.skipAllBefore(i);
+      if (baseIterator.hasNext()) {


If i is past the end of the set, I'm guessing baseIterator.hasNext will be false and the mark will remain unchanged. That means next will return it, even though it's not >= i. Is that right?

oops good catch, if there is no next it should reset the mark

gianm · 2019-11-08T15:53:08Z

processing/src/main/java/org/apache/druid/segment/data/ColumnarDoubles.java

+          if (offsets[0] < offsetMark) {
+            nullIterator = nullValueBitmap.peekableIterator();
+          }
+          offsetMark = offsets[0];


I think it'd make more sense to set it to the last value of offsets rather than the first. (With the first-offset approach, I think you'd see some issues if offset arrays on subsequent calls overlapped each other.)

oops, will fix and add a test

gianm · 2019-11-08T15:58:50Z

processing/src/main/java/org/apache/druid/segment/vector/VectorSelectorUtils.java

+        if (!nullIterator.hasNext()) {
+          break;
+        }
+        retVal[i] = row == nullIterator.peekNext();


Why not next() instead of peekNext()?

since the vectorized version doesn't save the nullMark like the other version, doing next here could consume a value that is outside of the range we are building the vector for, and we would lose it for the next vector that has that value.

gianm · 2019-11-08T16:02:39Z

processing/src/test/java/org/apache/druid/segment/data/NumericNullColumnSelectorTest.java

+  {
+    WrappedRoaringBitmap mutable = new WrappedRoaringBitmap();
+    for (int i = 0; i < numRows; i++) {
+      if (ThreadLocalRandom.current().nextDouble(0.0, 1.0) > 0.7) {


Maybe better to generate N random bitmaps with a fixed seed. That way, the test is deterministic (but still tests decent variety due to the N factor).

suneet-s · 2019-11-09T02:16:33Z

Sorry, that didn't end up being so short... I .. hope this didn't make it more confusing 😜

@clintropolis I finally got around to re-reading the discussion - this was very helpful. I think the part I missed initially was the z-axis in the first few heatmaps show the difference between 2 benchmark runs. I can finally take a pass at reading the code, now that I kinda understand what it's trying to do 😅

suneet-s

It would be nice to add unit tests for the PeekableIteratorAdapter classes since those seem like foundational classes

suneet-s · 2019-11-09T03:08:10Z

processing/src/main/java/org/apache/druid/collections/bitmap/PeekableIteratorAdapter.java

+public class PeekableIteratorAdapter<TIntIterator extends IntIterator> implements PeekableIntIterator
+{
+  static final int NOT_SET = -1;
+  final TIntIterator baseIterator;


Do you really want all of these to be package private instead of protected?

also nit: empty line between static and class variables

suneet-s · 2019-11-09T03:27:56Z

processing/src/main/java/org/apache/druid/collections/bitmap/PeekableIteratorAdapter.java

+  @Override
+  public void advanceIfNeeded(int i)
+  {
+    while (mark < i && baseIterator.hasNext()) {


is i always guaranteed to be positive?

yes, i is supplied by an Offset which should always be positive.

Thinking out loud here - feel free to ignore.

According to the interface javadocs i is the minVal that we want to advance to. There is no guarantee that minVal will always be positive. It looks like in right now we only pass in a positive integer, but it's possible someone can pass in negative values in the future? (if the baseIterator is a list of sorted integers - not sure if this is ever the case in druid)

If we set mark to Integer.MIN_VALUE instead of -1 will that support negative numbers as well without a performance hit?

I'd suggest dealing with this, if at all, just through javadocs saying that these interfaces should only be used with nonnegative ints. In practice these iterators are used for iterating over bitmaps that represent row numbers. It's doubly-impossible to see negative numbers: bitmaps cannot store negative ints, and row numbers cannot be negative either.

javadocs for this don't seem especially necessary to me, I can add if there is any other changes I need to make to this PR, otherwise I'd prefer to skip to avoid churning through CI again

suneet-s · 2019-11-09T03:41:12Z

processing/src/main/java/org/apache/druid/collections/bitmap/PeekableIteratorAdapter.java

+{
+  static final int NOT_SET = -1;
+  final TIntIterator baseIterator;
+  int mark = NOT_SET;


nit: mark was confusing for me to understand. Is this nextVal?

heh, I originally was calling this some form of 'next', however that was also confusing to me because it's actually the previous value of the baseIterator, just the next value of the adapter. So, I changed it to be mark since like, functionally this code is marking the position from the iterator it has consumed from to save it for the future. I guess I was thinking like how you mark the position on a buffer to save where you were so that you can go back to it?

Either is fine, maybe a javadoc explaining what it is.

suneet-s · 2019-11-09T03:46:17Z

processing/src/main/java/org/apache/druid/collections/bitmap/ImmutableBitmap.java

+   */
+  default PeekableIntIterator peekableIterator()
+  {
+    return new PeekableIteratorAdapter(iterator());


nit: IntelliJ is complaining about raw types here

return new PeekableIteratorAdapter<>(iterator());

thanks, will change

use peekable iterator for numeric column selector null checking inste…

e8c5644

…ad of bitmap.get for those sweet sweet nanoseconds

clintropolis added Performance Area - Querying Area - Null Handling labels Nov 5, 2019

clintropolis added 4 commits November 5, 2019 16:06

Merge remote-tracking branch 'upstream/master' into numeric-null-peek…

53e8891

…able-iterator

remove unused method

8d6133e

slight optimization i think

58f9803

remove clone from wrappers since we do not use and is confusing

175b809

gianm reviewed Nov 6, 2019

View reviewed changes

jihoonson reviewed Nov 6, 2019

View reviewed changes

clintropolis added 4 commits November 7, 2019 03:22

fixes and tests

e1a98a7

int instead of Integer

7d9063e

Merge remote-tracking branch 'upstream/master' into numeric-null-peek…

e14d61b

…able-iterator

fix it

d6cd124

gianm reviewed Nov 8, 2019

View reviewed changes

fixes, more tests

52a6f3b

suneet-s reviewed Nov 9, 2019

View reviewed changes

fix

0dbc0cc

gianm approved these changes Nov 13, 2019

View reviewed changes

gianm merged commit 9ed9a80 into apache:master Nov 13, 2019

clintropolis deleted the numeric-null-peekable-iterator branch November 13, 2019 20:29

clintropolis mentioned this pull request Nov 18, 2019

document SQL compatible null handling mode #8894

Merged

2 tasks

jon-wei added this to the 0.17.0 milestone Dec 17, 2019

a2l007 mentioned this pull request Feb 12, 2020

Performance degradation in topN queries when SQL-compatible null handling is enabled #9321

Open

This was referenced Apr 24, 2023

Null handling bitmap performance #5569

Closed

Slow double dimension query when useDefaultValueForNull set to false #8171

Closed

clintropolis mentioned this pull request Apr 17, 2024

use PeekableIntIterator for OR filter "partial index" value matchers #16300

Merged

4 tasks

		if (offset.isContiguous()) {
		final int startOffset = offset.getStartOffset();

optimize numeric column null value checking for low filter selectivity (more rows) #8822

optimize numeric column null value checking for low filter selectivity (more rows) #8822

Conversation

clintropolis commented Nov 5, 2019 • edited Loading

Description

Key changed/added classes in this PR

suneet-s commented Nov 5, 2019

clintropolis commented Nov 5, 2019

clintropolis commented Nov 5, 2019

jnaous commented Nov 5, 2019 via email

richardstartin commented Nov 5, 2019 • edited Loading

clintropolis commented Nov 5, 2019

clintropolis commented Nov 5, 2019

jnaous commented Nov 5, 2019 via email

lemire commented Nov 5, 2019

clintropolis commented Nov 5, 2019

clintropolis commented Nov 6, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jihoonson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

clintropolis commented Nov 8, 2019

peekable iterator better than get:

get better than peekable iterator:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

suneet-s commented Nov 9, 2019

suneet-s left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

clintropolis commented Nov 5, 2019 •

edited

Loading

richardstartin commented Nov 5, 2019 •

edited

Loading

clintropolis commented Nov 6, 2019 •

edited

Loading