Rewrite the Hash Index overflow file to support multiple copies #3012
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See #2938
This merges the
InMemFile
(previously used by theHashIndexBuilder
to write to the string overflow file) and theDiskOverflowFile
(used by the persistentHashIndex
) into a singleOverflowFile
, which combines elements of both into a system that can continue writing from where it left off before with all 256 concurrent hash indices.The overflow file next byte pos record in the wal file has been replaced with a header page in the overflow file which stores the 256
PageCursor
s that do the same thing, but support 256 concurrent writers (the header also stores a page counter that records the next page to be written to, which is necessary since newly added pages bypass the WAL file and get written directly to the overflow file, but in the case that the transaction is interrupted while flushing those pages will still remain in the file, so the counter in the header gets used instead of the size of the file).I've benchmarked this and it has more or less identical copy performance (for the first copy). For additional copies we will see once support is finished, but additional copies should be more or less the same as the first, with the exception that writing to the header, and the first incomplete page for each of the 256 writers will go through the WAL file.