Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow up to 4GB per bitmap index #8796

Merged
merged 1 commit into from
Jun 1, 2022

Conversation

richardstartin
Copy link
Member

This doubles the permissible size of a bitmap index in a backward compatible way by using the extra bit which is currently always clear because writing fails when the index size exceeds 2GB. An index size in excess of 2GB isn't recommended, but this is more tolerant than throwing an exception, and is a backward compatible change.

return new ImmutableRoaringBitmap(_bitmapBuffer.toDirectByteBuffer(offset - _firstOffset, length));
long offset = getOffset(dictId);
long length = getOffset(dictId + 1) - offset;
return new ImmutableRoaringBitmap(_bitmapBuffer.toDirectByteBuffer(offset - _firstOffset, (int) length));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the maximum size of a RoaringBitmap is 2^16 * (8KB + 2B) ~ 512MB so the cast to int never overflows.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be good to add this comments of analysis to the code above as well?

@richardstartin
Copy link
Member Author

This isn’t truly backward compatible - all servers need to be upgraded before the first segment with a bitmap index larger than 2GB is completed.

@codecov-commenter
Copy link

codecov-commenter commented May 27, 2022

Codecov Report

Merging #8796 (bd6a452) into master (d281a1b) will decrease coverage by 6.88%.
The diff coverage is 90.90%.

❗ Current head bd6a452 differs from pull request most recent head ff48ed7. Consider uploading reports for the commit ff48ed7 to get more accurate results

@@             Coverage Diff              @@
##             master    #8796      +/-   ##
============================================
- Coverage     69.69%   62.81%   -6.89%     
+ Complexity     4617     4602      -15     
============================================
  Files          1733     1687      -46     
  Lines         91188    89200    -1988     
  Branches      13630    13409     -221     
============================================
- Hits          63555    56027    -7528     
- Misses        23216    29128    +5912     
+ Partials       4417     4045     -372     
Flag Coverage Δ
integration1 ?
integration2 ?
unittests1 66.15% <90.90%> (+<0.01%) ⬆️
unittests2 14.14% <0.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...nt/creator/impl/inv/BitmapInvertedIndexWriter.java 88.09% <83.33%> (-1.91%) ⬇️
...gment/index/readers/BitmapInvertedIndexReader.java 100.00% <100.00%> (ø)
...va/org/apache/pinot/core/routing/RoutingTable.java 0.00% <0.00%> (-100.00%) ⬇️
...va/org/apache/pinot/common/config/NettyConfig.java 0.00% <0.00%> (-100.00%) ⬇️
...a/org/apache/pinot/common/metrics/MinionMeter.java 0.00% <0.00%> (-100.00%) ⬇️
...g/apache/pinot/common/metrics/ControllerMeter.java 0.00% <0.00%> (-100.00%) ⬇️
.../apache/pinot/common/metrics/BrokerQueryPhase.java 0.00% <0.00%> (-100.00%) ⬇️
.../apache/pinot/common/metrics/MinionQueryPhase.java 0.00% <0.00%> (-100.00%) ⬇️
...apache/pinot/common/helix/ExtraInstanceConfig.java 0.00% <0.00%> (-100.00%) ⬇️
...ache/pinot/server/access/AccessControlFactory.java 0.00% <0.00%> (-100.00%) ⬇️
... and 400 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d281a1b...ff48ed7. Read the comment docs.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we check overflow currently? I feel the old code won't fail on the writer side, but will fail on the reader side when the size is over 2GB?

@amrishlal
Copy link
Contributor

Do we check overflow currently? I feel the old code won't fail on the writer side, but will fail on the reader side when the size is over 2GB?

I don't think overflow is being checked. This is the stack trace I saw while attempting to generate a very large index (2GB+). This was more of a test rather than a production usecase, so we don't really need the limit to be increased, but if it can be done easily then no harm. Although, it would be good to check the overflow and add an error message.

Error: java.lang.RuntimeException: java.lang.IllegalArgumentException: Negative position at org.apache.pinot.hadoop.job.mappers.SegmentCreationMapper.map(SegmentCreationMapper.java:310) at org.apache.pinot.hadoop.job.mappers.SegmentCreationMapper.map(SegmentCreationMapper.java:66) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:177) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:171) Caused by: java.lang.IllegalArgumentException: Negative position at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:863) at org.apache.pinot.segment.local.segment.creator.impl.inv.BitmapInvertedIndexWriter.mapBitmapBuffer(BitmapInvertedIndexWriter.java:102) at org.apache.pinot.segment.local.segment.creator.impl.inv.BitmapInvertedIndexWriter.resizeIfNecessary(BitmapInvertedIndexWriter.java:95) at org.apache.pinot.segment.local.segment.creator.impl.inv.BitmapInvertedIndexWriter.add(BitmapInvertedIndexWriter.java:73) at org.apache.pinot.segment.local.segment.creator.impl.inv.json.OnHeapJsonIndexCreator.seal(OnHeapJsonIndexCreator.java:57) at org.apache.pinot.segment.local.segment.creator.impl.SegmentColumnarIndexCreator.seal(SegmentColumnarIndexCreator.java:560) at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.handlePostCreation(SegmentIndexCreationDriverImpl.java:266) at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.build(SegmentIndexCreationDriverImpl.java:238) at org.apache.pinot.hadoop.job.mappers.SegmentCreationMapper.map(SegmentCreationMapper.java:277) ... 9 more Suppressed: java.lang.IllegalArgumentException: Negative size at sun.nio.ch.FileChannelImpl.truncate(FileChannelImpl.java:324) at org.apache.pinot.segment.local.segment.creator.impl.inv.BitmapInvertedIndexWriter.close(BitmapInvertedIndexWriter.java:118) at org.apache.pinot.segment.local.segment.creator.impl.inv.json.OnHeapJsonIndexCreator.seal(OnHeapJsonIndexCreator.java:50) ... 13 more

@richardstartin
Copy link
Member Author

Do we check overflow currently? I feel the old code won't fail on the writer side, but will fail on the reader side when the size is over 2GB?

Currently the code fails on the writer side, but would also fail on the reader side. The right way to evolve this is to make these changes, but continue failing beyond 2GB on the writer side, then increase to 4GB the version after.

Copy link
Contributor

@amrishlal amrishlal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be good to add some comments for future reference, but otherwise looks good to me. The compatibility issue doesn't appear to be concern (at least on the clusters I work with) since all the index files are less than 2GB.

@siddharthteotia
Copy link
Contributor

Shall we add a test where we generate 2 bitmap inv indexes > 2GB and < 2GB respectively and the reader code changed in this PR can read both indexes correctly ?

@richardstartin
Copy link
Member Author

Shall we add a test where we generate 2 bitmap inv indexes > 2GB and < 2GB respectively and the reader code changed in this PR can read both indexes correctly ?

< 2GB is already tested. Tests could be added for > 2GB, with the downside of needing to create large files.

@jackjlli jackjlli added the backward-incompat Referenced by PRs that introduce or fix backward compat issues label May 31, 2022
@richardstartin
Copy link
Member Author

@jackjlli this is not backward incompatible

@jackjlli
Copy link
Contributor

jackjlli commented May 31, 2022

This isn’t truly backward compatible - all servers need to be upgraded before the first segment with a bitmap index larger than 2GB is completed.

I added the label as you mentioned the following sentence:

This isn’t truly backward compatible - all servers need to be upgraded before the first segment with a bitmap index larger than 2GB is completed.

I think the label is used for identifying any action needed if a newer version of Pinot is bumped in the service, so that ppl are aware of those actions. Feel free to update or remove the label if you feel it's unnecessary.

@richardstartin
Copy link
Member Author

Segments produced by the existing code can still be consumed, but the existing code isn’t forward compatible with a limit increase. This change can be made safely in two stages as mentioned above: 1) release these code changes but enforce a 2GB limit, which makes the reader code forward compatible with a limit increase but can’t produce unreadable segments, then 2) relax the 2GB limit in the next release. This just requires upgrading users not to skip versions.

@jackjlli jackjlli added release-notes Referenced by PRs that need attention when compiling the next release notes and removed backward-incompat Referenced by PRs that introduce or fix backward compat issues labels Jun 1, 2022
@siddharthteotia siddharthteotia merged commit 4269bfd into apache:master Jun 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes Referenced by PRs that need attention when compiling the next release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants