Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hand tune array-bitset aggregates #166

Open
saik0 opened this issue Jan 30, 2022 · 1 comment
Open

Hand tune array-bitset aggregates #166

saik0 opened this issue Jan 30, 2022 · 1 comment

Comments

@saik0
Copy link
Contributor

saik0 commented Jan 30, 2022

arXiv:1709.07821 Roaring Bitmaps: Implementation of an Optimized Software Library (Section 4.1.1) (Section 3.2)

Cardinality tracking was added in #127, however we're leaving some perf on the floor.

We also find that if we use bit-manipulation instructions and hand-tuned assembly, we can roughly
double the speed at which we can change bits in a bitset, compared to compiler-generated machine
code with all optimization flags activated. It is true not only when setting bits, but also when clearing
or flipping them—in which cases we need to use the btr and btc instructions instead. Informally,
we tested various C compilers and could not find any that could perform nearly as well as hand-tuned
assembly code. This may, of course, change in the future.

CRoaring source

Planning

  1. Do we want to implement this optimization? 2x is a huge speed up, but only on x86
  2. The compiler is sufficiently smart to use bts but not sbb, perhaps we can write rust that compiles to the asm we want
  3. If not there's always the asm! macro (nightly only), or if want it in stable then we can write a small C lib for the asm and link it
@saik0
Copy link
Contributor Author

saik0 commented Feb 24, 2022

inline asm stabilized 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant