Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Druid 0.13 ~ 0.18 version roaringbitmap benchmark becomes slow #9920

Closed
yinghao-wang opened this issue May 25, 2020 · 1 comment · Fixed by #9987
Closed

Druid 0.13 ~ 0.18 version roaringbitmap benchmark becomes slow #9920

yinghao-wang opened this issue May 25, 2020 · 1 comment · Fixed by #9987

Comments

@yinghao-wang
Copy link

Affected Version

Druid 0.13.0 ~ Druid 0.18.0

Pull down druid0.13 and druid0.18 version from the community. After compiling, run their respective benchmarks (The command is java -jar benchmarks.jar BitmapIterationBenchmark -p bitmapAlgo = "roaring") on roaringbitmap. It is found that the druid0.13 version can be run within ten minutes. The druid0.18 version requires More than an hour, and the results ran out not as expected. There are two cases of Druid 0.18 version which is much slower than Druid 0.13 version.

The roaring bitmap used by the druid 0.13 version is version 0.5.18, and the roaring bitmap used by the druid 0.18 version is version 0.8.11.
All data below comes from mac, 6 CPUs, 16GB memory.
I have also tried it on a server with 48 CPUs and 250G memory, and the results are similar, indicating that the test results are not related to large machines

Benchmark test results of druid 0.13 version:
image
Benchmark test results of druid 0.18 version:
image

image

The above benchmarks have been tested many times and the results are similar.

Below is my troubleshooting process:
Phenomenon 1: The total time of the benchmark becomes longer
Observed that the execution of the benchmark from 0.15 to 0.16 slowed down (not sure why), running bitmapAlgo = "roaring" -pn = 100 -p prob = 0.1 alone, the benchmark time increased from 40s (0.15) to 6min (0.16)
image
Phenomenon 2: Average time of two specific cases becomes larger
In the two intersectionAndIter cases (n = 100, prob = 0.1) (n = 100, prob = 0.5) of 0.13 ~ 0.14, the sorce increased obviously, from 300w ~ 500w (0.13) to 100 ~ 200 million (0.14)
case 1(n=100.prob=0.1)
image

case 2(n=100.prob=0.5)
image

Although the phenomenon 1 will slow down the benchmark execution, I don't think it will affect the performance. I think what really affects the performance is the sorce value of the two cases in the phenomenon 2, so I will try to analyze the reason why the sorce value of the two cases rises.

From the test data analysis phenomenon 2 reasons for the rise of Sorce:
Based on the Druid 0.13 version code, I replaced the roaring bitmap version, and the test results obtained are recorded as follows:
image

image

Observed from the data in the table, after the code change of # 6764 in the community is completed, the sorce of the above two scenes rises significantly.
Analyze the reason for the rise of Sorce from the flame diagram:
Modified the benchmark code to only run the two scenes that are slower in the above figure (prob = 0.1, 0.5), run multiple times in the Druid0.13 and Druid0.14 and grab the flame chart as follows (because the fork of the benchmark is set to 1 , So every time a case is run, a child process is forkd out to run, so the flame graph not only captures the main process, but also captures the child process of the specific scene):
Druid0.13 benchmark main process
image
Druid0.13 child process: intersectionAndIter when prob = 0.1
image
Druid0.13 child process: intersectionAndIter when prob = 0.5
image
Druid0.14 benchmark main process
image
Druid0.14 child process: intersectionAndIter when prob = 0.1
image
Druid0.14 child process: intersectionAndIter when prob = 0.5
image
Druid0.18 child process: intersectionAndIter when prob = 0.1
image
Druid0.18 child process: intersectionAndIter when prob = 0.5
image

The above flame chart has been grabbed multiple times, one of which was randomly taken
From the above flame chart, it is observed that different versions of Druid use different Containers to take intersections in roaring bitmap. Druid version 0.13 uses ArrayContainer or bitmapContainer, but Druid 0.14 and later versions use RunContainer (because of the grabbed Both 0.14 and 0.18 flame charts use RunContainer, so it is speculated that the versions from 0.14 to 0.18 are all used RunContainer)
I think that RunContainer's compression efficiency under this benchmark is not good, because looking at the construction code of fake data, I found that the distribution of fake data is very sparse and the values ​​are not continuous.

Code to construct fake data:
image
image

Observing the setup code, it is found that the initialized bitmap is very sparse and the values ​​are not continuous. In this scenario, I think RunContainer does not have the high compression efficiency of BitmapContainer and ArrayContainer, which may cause some performance problems.

Conclusion: The code change of # 6764 in the community may be related to the use of RunContainer when the Roaring bitmap benchmark is constructed. It is necessary to further observe the impact of the # 6764 code to analyze,I hope everyone can help to see some of these problems, the above is just my personal analysis

@richardstartin
Copy link
Member

richardstartin commented May 25, 2020

The code change in #6764 aimed to improve the performance of bitmap construction, and it applies run compression as the bitmap is built, but only to containers which are run-compressible. If this results in slower intersections in some cases, this can be fixed in the RoaringBitmap library; some changes were recently made to significantly improve run vs bitmap andCardinality and intersects and these changes are applicable to run vs bitmap and.

Note that #6764 will not have changed any behaviour when the roaring bitmaps have been memory mapped from disk and where run compression has already been applied (immediately before serialisation), which might explain if no regression has otherwise come to light since the change...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants