Hash Index Rework #2287

ray6080 · 2023-10-27T18:40:21Z

Problems

There are several issues in terms of usability and performance with our hash index.

Performance wise:

~~It doesn't scale to multiple threads.~~
~~It doesn't support rehash dynamically, thus, the CSV reader is forced to count exact num of tuples, which is often slow.~~
~~Per-index Memfile Locking Contention #2626~~
More optimizations can be integrated to improve the performance of the index data structure, like fingerprints, stashed buckets, balanced insertions, and displacement in presented in Dash¹.

Usability wise:

~~It wasn't coded in a way to scale to different data types for keys.~~ (Support most simple types in the hash index #2728)
~~It limits string keys to be equal or less than 4KB.~~ (fixed in Increase the length limit for primary key strings to 256KB #2689)
Parallel Hash Index Database Bloat #2625
Need to support multi-copy.

Parallel hash index.
Support dynamic growing and remove counting from CSV reader.
Rework string layout to get rid of ku_string_t.
Add fingerprint optimization.
Rework to scale to various key data types. (Support most simple types in the hash index #2728)
Separate the hash index building into a separate physical operator.
Add support of CREATE INDEX, and alter node table to define primary key (defining the primary key when defining a node table is no longer required, but require the primary key exists when defining a rel table over it).
Merge hash indexes into a single file. (to be debated whether directly merged into data.kz or keep a index.kz file).

The text was updated successfully, but these errors were encountered:

ray6080 added the performance optimization label Oct 27, 2023

ray6080 assigned ray6080 and Riolku Oct 27, 2023

ray6080 assigned benjaminwinger Jan 11, 2024

benjaminwinger mentioned this issue Feb 16, 2024

Persistent hash index performance improvements #2908

Merged

benjaminwinger mentioned this issue Feb 23, 2024

Hash Index for multiple copy #2938

Open

6 tasks

ray6080 unassigned Riolku Apr 2, 2024