Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite the Hash Index overflow file to support multiple copies #3012

Merged
merged 1 commit into from
Mar 12, 2024

Conversation

benjaminwinger
Copy link
Collaborator

See #2938

This merges the InMemFile (previously used by the HashIndexBuilder to write to the string overflow file) and the DiskOverflowFile (used by the persistent HashIndex) into a single OverflowFile, which combines elements of both into a system that can continue writing from where it left off before with all 256 concurrent hash indices.
The overflow file next byte pos record in the wal file has been replaced with a header page in the overflow file which stores the 256 PageCursors that do the same thing, but support 256 concurrent writers (the header also stores a page counter that records the next page to be written to, which is necessary since newly added pages bypass the WAL file and get written directly to the overflow file, but in the case that the transaction is interrupted while flushing those pages will still remain in the file, so the counter in the header gets used instead of the size of the file).

I've benchmarked this and it has more or less identical copy performance (for the first copy). For additional copies we will see once support is finished, but additional copies should be more or less the same as the first, with the exception that writing to the header, and the first incomplete page for each of the 256 writers will go through the WAL file.

Copy link

codecov bot commented Mar 8, 2024

Codecov Report

Attention: Patch coverage is 94.50549% with 10 lines in your changes are missing coverage. Please review.

Project coverage is 93.28%. Comparing base (38e4398) to head (7c25a3b).
Report is 15 commits behind head on master.

Files Patch % Lines
src/storage/storage_structure/overflow_file.cpp 93.28% 9 Missing ⚠️
src/storage/index/hash_index.cpp 92.85% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3012      +/-   ##
==========================================
+ Coverage   93.25%   93.28%   +0.03%     
==========================================
  Files        1124     1128       +4     
  Lines       42934    42950      +16     
==========================================
+ Hits        40040    40068      +28     
+ Misses       2894     2882      -12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/include/storage/storage_structure/overflow_file.h Outdated Show resolved Hide resolved
src/include/storage/storage_structure/overflow_file.h Outdated Show resolved Hide resolved
src/include/storage/storage_structure/overflow_file.h Outdated Show resolved Hide resolved
src/include/storage/storage_structure/overflow_file.h Outdated Show resolved Hide resolved
src/include/storage/storage_structure/overflow_file.h Outdated Show resolved Hide resolved
src/include/storage/storage_structure/overflow_file.h Outdated Show resolved Hide resolved
src/storage/storage_structure/overflow_file.cpp Outdated Show resolved Hide resolved
src/storage/storage_structure/overflow_file.cpp Outdated Show resolved Hide resolved
src/include/storage/storage_structure/overflow_file.h Outdated Show resolved Hide resolved
@benjaminwinger benjaminwinger merged commit 0c26056 into master Mar 12, 2024
15 checks passed
@benjaminwinger benjaminwinger deleted the multi-copy-overflow-file branch March 12, 2024 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants