Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel Hash Index #2615

Merged
merged 3 commits into from
Jan 3, 2024
Merged

Parallel Hash Index #2615

merged 3 commits into from
Jan 3, 2024

Commits on Jan 3, 2024

  1. function: use splitmix64 for hashing

    SplitMix64 is an excellent integer hashing function. According to [this
    blog][1], it is the main function to beat in terms of hashing. It is
    simple and provides much better output than our previous ones.
    
    In particular, this function does a good job of shuffling the higher
    bits of the output, a property critical for the new hash index design.
    Riolku committed Jan 3, 2024
    Configuration menu
    Copy the full SHA
    c1b5fb7 View commit details
    Browse the repository at this point in the history
  2. storage: use parallel hash index

    The design is quite simple: every hash index is now represented
    internally as 256 hash indexes. This way, when copying, we can easily
    operator on multiple indexes at once without locking.
    Riolku committed Jan 3, 2024
    Configuration menu
    Copy the full SHA
    d9746ca View commit details
    Browse the repository at this point in the history
  3. processor: use queue-based index building

    This also moves index building to its own file. Future work may move it
    to its own standalone operator.
    
    These changes break RDF tests, so they have been disabled. They cause
    higher memory usage, so LDBC and LSQB buffer pool sizes have been
    adjusted. They vastly increase the performance - ingesting 100 million
    integers from a parquet file with 64 threads takes about 90 seconds on
    master, but about 5 seconds with this change.
    Riolku committed Jan 3, 2024
    Configuration menu
    Copy the full SHA
    da0e70f View commit details
    Browse the repository at this point in the history