Integer Bitpacking extensions #2083

benjaminwinger · 2023-09-25T20:55:31Z

From the extra TODOs in #2004, ordered roughly from simplest to most complicated.

Var List offsets can be bitpacked
InternalIDs can be bitpacked
INT16 and INT8 bitpacking
Small reads/writes to bitpacked data could be optimized to work on individual data instead of relying on packing/unpacking a 32-value chunk to a temporary array.
Extra values could be packed into the end of each page when the page does not divide evenly into 32-value chunks.
SIMD bitpacking

Notes

InternalIDs

Stored as offset_t (uint64_t), but would need an extra layer to handle widening the results into the internal_id_t struct. This could cause significant slowdowns when decompressing since we wouldn't be able to use the fast mass decompression functions directly. Since the current random access implementation is not very efficient (unpacks 32 values to get one) it would probably be more performant to unpack to a temporary buffer and then copy the values into the value vector

Filling the extra space at the end of each page

This would be trivial to implement when direct reads and writes of bitpacked data is implemented.

SIMD

Mostly complicated due to handling compiler support.

Benchmarks

It would also be helpful to set up some small benchmarks (maybe using google benchmark, essentially what I did when comparing implementations at the start) so that we can do some fast comparisons of the compression/decompression speed without having to deal with the overhead of the rest of the system (not that such integrated benchmarks aren't also useful).

benjaminwinger · 2023-10-06T15:13:38Z

std::experimental::simd is available with GCC 11 and newer (and possibly clang?, searches found references in their repos, e.g., but no docs).

According to this it's planned to be merged into the standard in c++26.
It might be a reasonably easy way of doing custom SIMD, e.g. for the signed extension and frame of reference calculations, with the caveat that it would only be available in certain environments (though it's likely to increase in availability as c++26 gets closer).

semihsalihoglu-uw assigned benjaminwinger Jan 8, 2024

semihsalihoglu-uw added performance optimization kernel and removed kernel labels Jan 8, 2024

benjaminwinger mentioned this issue Jun 19, 2024

Allow integer packing of numValues in a page that isn't a multiple of 32 #3671

Closed

royi-luo self-assigned this Jun 20, 2024

royi-luo mentioned this issue Jun 20, 2024

Allow integer packing and unpacking of partial chunks #3681

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integer Bitpacking extensions #2083

Integer Bitpacking extensions #2083

benjaminwinger commented Sep 25, 2023 •

edited by royi-luo

Loading

benjaminwinger commented Oct 6, 2023

Integer Bitpacking extensions #2083

Integer Bitpacking extensions #2083

Comments

benjaminwinger commented Sep 25, 2023 • edited by royi-luo Loading

Notes

InternalIDs

Filling the extra space at the end of each page

SIMD

Benchmarks

benjaminwinger commented Oct 6, 2023

benjaminwinger commented Sep 25, 2023 •

edited by royi-luo

Loading