Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix performance regression from #2753 in IndexMerger #2841

Merged
merged 1 commit into from
Apr 15, 2016

Conversation

jon-wei
Copy link
Contributor

@jon-wei jon-wei commented Apr 15, 2016

Fixes #2804

Reordering the bitset.or(nullRowsList.get(i)); call affects the performance of ConciseSet.add():

public class ConciseSet extends AbstractIntSet implements java.io.Serializable
...
  @Override
  public boolean add(int e)
  {
    modCount++;

    // range check
    if (e < ConciseSetUtils.MIN_ALLOWED_SET_BIT || e > ConciseSetUtils.MAX_ALLOWED_INTEGER) {
      throw new IndexOutOfBoundsException(String.valueOf(e));
    }

    // the element can be simply appended
    if (e > last) {
      append(e);
      return true;
    }

ConciseSet add() can optimize when an element being added is an "append".

When the bitset.or() call is placed before this block below in IndexMerger, it affects the optimization of subsequent bitset.add(row) calls.

for (Integer row : CombiningIterable.createSplatted(
               convertedInverteds,
               Ordering.<Integer>natural().nullsFirst()
           )) {
             if (row != INVALID_ROW) {
               bitset.add(row);
             }
           }

performance numbers using this benchmark:
https://github.com/jon-wei/druid/blob/datagen/benchmarks/src/main/java/io/druid/benchmark/indexing/IndexMergeBenchmark.java

The benchmark creates 5 randomly generated segments with 75,000 rows each and times how long it takes to merge them, for a total of 375,000 rows.

There are 5 dimensions, 1 non-null dimension and 4 dimensions comprised of ~90-95% null rows.

merging without reordering fix:

Benchmark                      Mode  Cnt         Score        Error  Units
IndexMergeBenchmark.merge    avgt   15  16045637.529 ± 284039.576  us/op
IndexMergeBenchmark.mergeV9  avgt   15   2749360.023 ±  40849.933  us/op

merging with null row handling in question removed:

Benchmark                      Mode  Cnt        Score       Error  Units
IndexMergeBenchmark.merge    avgt   15  3683830.050 ± 92200.397  us/op
IndexMergeBenchmark.mergeV9  avgt   15  2765897.113 ± 34061.016  us/op

merge with fix (reorder the calls)

Benchmark                    Mode  Cnt        Score       Error  Units
IndexMergeBenchmark.merge    avgt   15  3727564.239 ± 51567.596  us/op
IndexMergeBenchmark.mergeV9  avgt   15  2777533.755 ± 56258.551  us/op
``

@jon-wei jon-wei added the Bug label Apr 15, 2016
@jon-wei jon-wei added this to the 0.9.1 milestone Apr 15, 2016
@gianm
Copy link
Contributor

gianm commented Apr 15, 2016

👍

1 similar comment
@binlijin
Copy link
Contributor

👍

@fjy fjy merged commit b534f72 into apache:master Apr 15, 2016
@jon-wei jon-wei deleted the nullrows_merger_fix branch October 6, 2017 22:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance regression introduced by #2753
4 participants