Reclaim empty overflow slots in memory hash index #3438
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When splitting, move empty overflow slots into a global linked list and re-use them before allocating new slots.
I started working on this to try and simplify how deletions will work in the in memory hash index (needed to unify the hash index local storage for copies and inserts/deletions), as removing empty overflow slots from the end of the slot chains makes it easier to find the last entry in a slot without having to backtrack if the last slot is empty.
This ended up being a little more complicated than I expected, but for a hash index of 60 million consecutive integers it reduces memory use from the slots from roughly 2.28GB to 1.75GB, reducing the number of overflow slots by more than half, and it seems to slightly increase performance (presumably because it reduces the number of allocations).
Something similar could be done for disk slots (see last TODO in #2938 (comment)), the main difference being that disk slots may have gaps, but the gaps could be removed when splitting.
This avoids breaking the storage format by adding the new field (which is not used by the on-disk index) to the end of the hash index header.