Insert into the hash index builder one chunk at a time #2997
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
First part of #2938.
There is a small performance improvement, though it's obscured somewhat by the amount of variance I was seeing between runs. It was roughly 100ms improvement (copying a table with just 60 million integer primary keys, went from ~2.3 to ~2.2 seconds).
This replaces the one at a time append in
HashIndexBuilder
with an append function that takes a StaticVector (since that's what's used in the IndexBuilder's queues). Resizing the index and calculating the hashes is done on the 1024 values in the StaticVector all at once, before each value is inserted one by one.I also removed the second type from the HashIndexBuilder since the StaticVector stores a
std::string
and it makes more sense to just hardcode places which take astd::string
/std::string_view
usingstd::conditional
than to add more template parameters (and there is only really one variable parameter anyway).