Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hash Index Rework #2287

Open
4 of 8 tasks
ray6080 opened this issue Oct 27, 2023 · 0 comments
Open
4 of 8 tasks

Hash Index Rework #2287

ray6080 opened this issue Oct 27, 2023 · 0 comments

Comments

@ray6080
Copy link
Contributor

ray6080 commented Oct 27, 2023

Problems

There are several issues in terms of usability and performance with our hash index.

Performance wise:

  1. It doesn't scale to multiple threads.
  2. It doesn't support rehash dynamically, thus, the CSV reader is forced to count exact num of tuples, which is often slow.
  3. Per-index Memfile Locking Contention #2626
  4. More optimizations can be integrated to improve the performance of the index data structure, like fingerprints, stashed buckets, balanced insertions, and displacement in presented in Dash1.

Usability wise:

  1. It wasn't coded in a way to scale to different data types for keys. (Support most simple types in the hash index #2728)
  2. It limits string keys to be equal or less than 4KB. (fixed in Increase the length limit for primary key strings to 256KB #2689)
  3. Parallel Hash Index Database Bloat #2625
  4. Need to support multi-copy.

TODOs

  • Parallel hash index.
  • Support dynamic growing and remove counting from CSV reader.
  • Rework string layout to get rid of ku_string_t.
  • Add fingerprint optimization.
  • Rework to scale to various key data types. (Support most simple types in the hash index #2728)
  • Separate the hash index building into a separate physical operator.
  • Add support of CREATE INDEX, and alter node table to define primary key (defining the primary key when defining a node table is no longer required, but require the primary key exists when defining a rel table over it).
  • Merge hash indexes into a single file. (to be debated whether directly merged into data.kz or keep a index.kz file).

Footnotes

  1. Dash: Scalable Hashing on Persistent Memory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants