-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix performance problem of base chunk forward index write #7930
Fix performance problem of base chunk forward index write #7930
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense, the impact on fixed width data which always use tiny buffers, wasn’t considered when removing the buffer for compressed chunks.
Codecov Report
@@ Coverage Diff @@
## master #7930 +/- ##
=========================================
Coverage 71.15% 71.15%
- Complexity 4111 4113 +2
=========================================
Files 1593 1593
Lines 82365 82362 -3
Branches 12270 12269 -1
=========================================
- Hits 58609 58607 -2
Misses 19806 19806
+ Partials 3950 3949 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
For posterity, we do have a benchmark which would catch this regression if it were to happen again: master:
branch:
For comparison, the mmap approach does work well for the V4 raw index writer, which is slightly slower to build than V3 on this branch but is guaranteed never to use more than 1MB memory (and is slightly faster to read according to benchmarks):
|
_dataChannel = new RandomAccessFile(file, "rw").getChannel(); | ||
_header = _dataChannel.map(FileChannel.MapMode.READ_WRITE, 0, _dataOffset); | ||
writeHeader(compressionType, totalDocs, numDocsPerChunk, sizeOfEntry, version); | ||
_compressedBuffer = ByteBuffer.allocateDirect(chunkSize * 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to use ByteBuffer.allocateDirect(_chunkCompressor.maxCompressedSize(chunkSize))
instead because most compressors can produce a much tighter bound than 2x. E.g. LZ4 (default) produces 8240 for 8192 and 1052704 for 1MB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion. Refactored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sajjad-moradi @richardstartin, I think 2x was being used to account for worst case where there can be negative compression which is a possibility (although unlikely) imo. Unless we are 100% sure that our compression codecs (LZ4, snappy, ZSTD) are never going to result in negative compression for any kind of data, using chunkSize is fine. Otherwise, I suggest moving it back to 2x (old code) or something more than chunkSize at least
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each compression codec provides a bound for the worst case, i.e. when the data is incompressible and the result is larger than the input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have these bounds available at run time (preferably as an API)? Otherwise, instead of trying to optimize to something we don't know, we should revert to code that we know works in production environment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These bounds are specified by the compression codecs themselves. They take the size of the uncompressed input as a parameter and produce the largest possible compressed size for that input size. This isn't trying to optimise something we don't know, but making proper use of compression libraries. Playing devil's advocate, I'm curious how you know 2x is large enough for any data? Wouldn't you prefer to use the compression library's specified bounds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@siddharthteotia What do you mean by negative compression? If you mean compressed size is more than the input size, then this maxCompressedSize function returns the upper bound for that - so we don't need to always go with 2x size.
BTW here's the max value for the input size of 1000 for different compressors:
pass through: 1000
snappy: 1198
zsandard: 1066
lz4: 1019
lz4 with length compressor: 1023
Description
This PR fixes the performance issue of
BaseChunkSVForwardIndexWriter
by using a byte buffer instead of memory mapping of each compressed chunk. Please refer to #7929 for more details.Testing Done
For one table that has a non-dictionary column, with existing code it takes more than 30 minutes to build the index, but with the fix it takes around one minute to do so.