Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This optimizes the in memory hash index splitting to track the insertion position instead of probing from the beginning of the slot each time. It has a relatively small, but noticeable impact; I'd written it mostly because the other optimization I'm working on (removing the requirement that the in memory hash index used for multi-copy needs to have the same capacity as the on-disk index) seemed to make the cost of this much more significant (it should for the second copy, where the in-memory index no longer has a much larger capacity than necessary, but I'm not sure why it's making a difference on the first copy too).
Also note that it only helps in the cases where we under-estimate the space necessary for the in-memory index (we reserve capacity based on the number of rows divided by the number of hash indices, but the keys are divided up based on their hash and aren't evenly distributed between the 256 indices).