Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using RoaringBitmapWriter for bitmap construction #6764

Merged
merged 1 commit into from
Jan 9, 2019
Merged

Consider using RoaringBitmapWriter for bitmap construction #6764

merged 1 commit into from
Jan 9, 2019

Conversation

richardstartin
Copy link
Member

I noticed you are investigating using RoaringBitmap 0.7.30. If you do so, it is worth considering a different mechanism to build your bitmaps which favours ordered insertions by buffering them into a container and appending to the bitmap as late as possible (just before the bitmap is queried, or when the next multiple of 2^16 is crossed). There is no need to manually run optimise bitmaps built this way, because they are run optimised at the container level whenever it is appended to the bitmap.

I am not a druid user and am unlikely to become one soon, so this PR is intended as an FYI about the feature only.

I ran the BitmapIterationBenchmark.constructAndIter on JDK8 on Ubuntu 16.0.4 at 7a09cde4de1953eee75c5033e863cfde8f94d6c1 and got:

Benchmark                                     (bitmapAlgo)  (n)  (prob)   (size)  Mode  Cnt          Score          Error  Units
BitmapIterationBenchmark.constructAndIter           bitset  N/A     0.0  1000000  avgt    5         11.958 ±        0.411  ns/op
BitmapIterationBenchmark.constructAndIter           bitset  N/A   0.001  1000000  avgt    5      55820.663 ±     4765.600  ns/op
BitmapIterationBenchmark.constructAndIter           bitset  N/A     0.1  1000000  avgt    5     853821.933 ±    10916.693  ns/op
BitmapIterationBenchmark.constructAndIter           bitset  N/A     0.5  1000000  avgt    5    3014089.931 ±    65283.409  ns/op
BitmapIterationBenchmark.constructAndIter           bitset  N/A    0.99  1000000  avgt    5    5628379.542 ±   227488.488  ns/op
BitmapIterationBenchmark.constructAndIter           bitset  N/A     1.0  1000000  avgt    5    5612304.605 ±    54199.692  ns/op
BitmapIterationBenchmark.constructAndIter          concise  N/A     0.0  1000000  avgt    5          8.073 ±        0.178  ns/op
BitmapIterationBenchmark.constructAndIter          concise  N/A   0.001  1000000  avgt    5      27473.710 ±      626.528  ns/op
BitmapIterationBenchmark.constructAndIter          concise  N/A     0.1  1000000  avgt    5    3635751.625 ±    56888.246  ns/op
BitmapIterationBenchmark.constructAndIter          concise  N/A     0.5  1000000  avgt    5    9798233.678 ±   237069.198  ns/op
BitmapIterationBenchmark.constructAndIter          concise  N/A    0.99  1000000  avgt    5    9588921.943 ±   214705.602  ns/op
BitmapIterationBenchmark.constructAndIter          concise  N/A     1.0  1000000  avgt    5    8077899.901 ±   118088.071  ns/op
BitmapIterationBenchmark.constructAndIter          roaring  N/A     0.0  1000000  avgt    5        131.791 ±        2.237  ns/op
BitmapIterationBenchmark.constructAndIter          roaring  N/A   0.001  1000000  avgt    5      46860.367 ±     1753.583  ns/op
BitmapIterationBenchmark.constructAndIter          roaring  N/A     0.1  1000000  avgt    5    1709465.928 ±    38854.875  ns/op
BitmapIterationBenchmark.constructAndIter          roaring  N/A     0.5  1000000  avgt    5    6898408.274 ±   210501.998  ns/op
BitmapIterationBenchmark.constructAndIter          roaring  N/A    0.99  1000000  avgt    5   13340397.558 ±   283832.841  ns/op
BitmapIterationBenchmark.constructAndIter          roaring  N/A     1.0  1000000  avgt    5   13415893.194 ±   170437.084  ns/op

At b313193c81ed868a9afe04c658306705f63daaef I got:

Benchmark                                  (bitmapAlgo)  (prob)   (size)  Mode  Cnt         Score         Error  Units
BitmapIterationBenchmark.constructAndIter        bitset     0.0  1000000  avgt    5        12.665 ±       1.104  ns/op
BitmapIterationBenchmark.constructAndIter        bitset   0.001  1000000  avgt    5     74471.073 ±   54445.042  ns/op
BitmapIterationBenchmark.constructAndIter        bitset     0.1  1000000  avgt    5    887366.201 ±   35143.327  ns/op
BitmapIterationBenchmark.constructAndIter        bitset     0.5  1000000  avgt    5   3166248.403 ±  495669.632  ns/op
BitmapIterationBenchmark.constructAndIter        bitset    0.99  1000000  avgt    5   6324809.163 ± 1012027.080  ns/op
BitmapIterationBenchmark.constructAndIter        bitset     1.0  1000000  avgt    5   5913067.177 ±  132629.211  ns/op
BitmapIterationBenchmark.constructAndIter       concise     0.0  1000000  avgt    5         8.068 ±       0.115  ns/op
BitmapIterationBenchmark.constructAndIter       concise   0.001  1000000  avgt    5     27547.146 ±     546.018  ns/op
BitmapIterationBenchmark.constructAndIter       concise     0.1  1000000  avgt    5   3635772.683 ±   56079.798  ns/op
BitmapIterationBenchmark.constructAndIter       concise     0.5  1000000  avgt    5  10400194.474 ±  147495.368  ns/op
BitmapIterationBenchmark.constructAndIter       concise    0.99  1000000  avgt    5   9409295.891 ±  122748.484  ns/op
BitmapIterationBenchmark.constructAndIter       concise     1.0  1000000  avgt    5   8641773.847 ±  193212.416  ns/op
BitmapIterationBenchmark.constructAndIter       roaring     0.0  1000000  avgt    5        75.142 ±       1.408  ns/op
BitmapIterationBenchmark.constructAndIter       roaring   0.001  1000000  avgt    5     13348.323 ±     205.987  ns/op
BitmapIterationBenchmark.constructAndIter       roaring     0.1  1000000  avgt    5   1300962.745 ±   30107.110  ns/op
BitmapIterationBenchmark.constructAndIter       roaring     0.5  1000000  avgt    5   5255974.593 ±  133759.112  ns/op
BitmapIterationBenchmark.constructAndIter       roaring    0.99  1000000  avgt    5  11122438.742 ±  168793.197  ns/op
BitmapIterationBenchmark.constructAndIter       roaring     1.0  1000000  avgt    5  11555370.664 ±  141606.255  ns/op

These quick results are quite noisy so require more careful consideration on your part. I am also unaware of how likely out of order insertions are in a typical druid workload, where there would be no penalty for using this abstraction, but there will be no benefit.

@b-slim
Copy link
Contributor

b-slim commented Dec 20, 2018

@richardstartin can you please fix

Selected: Unused declaration (1)
processing/src/main/java/org/apache/druid/collections/bitmap
WrappedRoaringBitmap.java (1)
54: WrappedRoaringBitmap() Parameter compressRunOnSerialization is not used in either this method or any of its derived methods

to have a clean build. Thanks

@richardstartin
Copy link
Member Author

richardstartin commented Dec 20, 2018

@b-slim this is more of a working heads up (i.e. I think you can probably build your roaring bitmaps faster) rather than something I would expect you to merge. The fact that run compression can be applied incrementally so you don't need to do it when you serialise the bitmaps ripples out fairly quickly to "BitmapSerdeFactory" JSON formats, and I can't offer a view on how they should be kept backward compatible. If you update, please feel free to use this commit as a reference.

Copy link
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richardstartin Thanks for taking interest enough to relay this information! Outside of benchmarks, I believe all of the bitmaps we construct are done in order, so if I understand you correctly there would be little benefit to making this change, and I suspect the main difference in the original benchmarks you collected was the performance improvements from the version bump, which I saw as well.

That said, if there is no harm either, I don't see any reason to not make the switch, and could be useful in the event we find ourselves in need of creating out of order bitmaps. But maybe we should repeat the benchmarks against the latest master to ensure there is no penalty for this change?

I am unsure what the best thing to do with regards to the compressRunOnSerialization parameter, I can't imagine the resulting segment size is very viable without it, but yeah it's very hard and annoying to remove things like that.

@richardstartin
Copy link
Member Author

@clintropolis the change optimises ordered insertion, because it avoids binary search on the high 16 bits of the bitmap. The change is useless for unordered insertions.

@clintropolis
Copy link
Member

Ah, I misread the PR description, 👍

@richardstartin
Copy link
Member Author

richardstartin commented Jan 2, 2019

@clintropolis You were right about where most of the performance came from. I cleaned up the commits a bit and ran against master.

Here's the benchmark at 114a9fc38feda5f85799d24889007bc572d04dea at 0.7.30

Benchmark                                  (bitmapAlgo)  (prob)   (size)  Mode  Cnt         Score         Error  Units
BitmapIterationBenchmark.constructAndIter       roaring     0.0  1000000  avgt    5       130.624 ±       2.645  ns/op
BitmapIterationBenchmark.constructAndIter       roaring   0.001  1000000  avgt    5     17553.925 ±    1177.041  ns/op
BitmapIterationBenchmark.constructAndIter       roaring     0.1  1000000  avgt    5   1704213.394 ±   51487.534  ns/op
BitmapIterationBenchmark.constructAndIter       roaring     0.5  1000000  avgt    5   6831889.531 ±  146377.716  ns/op
BitmapIterationBenchmark.constructAndIter       roaring    0.99  1000000  avgt    5  13106844.584 ±  661339.555  ns/op
BitmapIterationBenchmark.constructAndIter       roaring     1.0  1000000  avgt    5  15204652.686 ± 1441562.179  ns/op

Here's a slight improvement on this branch at 1afb602de27d31367440b1cccc86ec799c59dc4c owing to reduced construction times.

Benchmark                                  (bitmapAlgo)  (prob)   (size)  Mode  Cnt         Score        Error  Units
BitmapIterationBenchmark.constructAndIter       roaring     0.0  1000000  avgt    5       189.940 ±      3.313  ns/op
BitmapIterationBenchmark.constructAndIter       roaring   0.001  1000000  avgt    5     13719.152 ±     42.376  ns/op
BitmapIterationBenchmark.constructAndIter       roaring     0.1  1000000  avgt    5   1268587.758 ±  42864.087  ns/op
BitmapIterationBenchmark.constructAndIter       roaring     0.5  1000000  avgt    5   4658899.187 ± 163463.751  ns/op
BitmapIterationBenchmark.constructAndIter       roaring    0.99  1000000  avgt    5  10556288.928 ± 212975.696  ns/op
BitmapIterationBenchmark.constructAndIter       roaring     1.0  1000000  avgt    5  11036729.972 ± 346125.258  ns/op

PS this has been squashed and force pushed so take another look.

@richardstartin
Copy link
Member Author

In fact, here's a benchmark to isolate the construction time from the iteration time:

  @Benchmark
  public Object construct(ConstructAndIterState state)
  {
    int dataSize = state.dataSize;
    int[] data = state.data;
    MutableBitmap mutableBitmap = factory.makeEmptyMutableBitmap();
    for (int i = 0; i < dataSize; i++) {
      mutableBitmap.add(data[i]);
    }
    return factory.makeImmutableBitmap(mutableBitmap);
  }

and at 114a9fc38feda5f85799d24889007bc572d04dea (master) I get

Benchmark                           (bitmapAlgo)  (prob)   (size)  Mode  Cnt         Score        Error  Units
BitmapIterationBenchmark.construct       roaring     0.0  1000000  avgt    5       124.236 ±      6.597  ns/op
BitmapIterationBenchmark.construct       roaring   0.001  1000000  avgt    5     14045.045 ±   1026.400  ns/op
BitmapIterationBenchmark.construct       roaring     0.1  1000000  avgt    5   1317274.340 ± 153275.511  ns/op
BitmapIterationBenchmark.construct       roaring     0.5  1000000  avgt    5   7415001.388 ± 377532.457  ns/op
BitmapIterationBenchmark.construct       roaring    0.99  1000000  avgt    5  10687372.213 ± 860813.095  ns/op
BitmapIterationBenchmark.construct       roaring     1.0  1000000  avgt    5  10790794.579 ± 961663.105  ns/op

And at 1afb602de27d31367440b1cccc86ec799c59dc4c (this PR) I get:

Benchmark                           (bitmapAlgo)  (prob)   (size)  Mode  Cnt        Score        Error  Units
BitmapIterationBenchmark.construct       roaring     0.0  1000000  avgt    5      187.924 ±     12.126  ns/op
BitmapIterationBenchmark.construct       roaring   0.001  1000000  avgt    5    12674.625 ±    267.506  ns/op
BitmapIterationBenchmark.construct       roaring     0.1  1000000  avgt    5   868981.551 ±  21139.938  ns/op
BitmapIterationBenchmark.construct       roaring     0.5  1000000  avgt    5  3391372.332 ±  86345.168  ns/op
BitmapIterationBenchmark.construct       roaring    0.99  1000000  avgt    5  7314761.225 ± 335907.952  ns/op
BitmapIterationBenchmark.construct       roaring     1.0  1000000  avgt    5  7147180.460 ± 107730.956  ns/op

So there's a good boost to be had from avoiding the binary searches, and it sounds like druid does write its bitmaps in an ordered fashion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants