Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace std::hash with our custom hash function #2952

Merged
merged 1 commit into from
Feb 27, 2024
Merged

Replace std::hash with our custom hash function #2952

merged 1 commit into from
Feb 27, 2024

Conversation

benjaminwinger
Copy link
Collaborator

@benjaminwinger benjaminwinger commented Feb 26, 2024

std::hash is implementation dependent and not guaranteed to be the same between different stdlibs, which means our current hash index layout is not always portable.

std::hash also apparently is not required to produce the same result for the same input in different runs of the same program, and while there hasn't been any evidence of that yet, it could cause issues in future.

From https://en.cppreference.com/w/cpp/utility/hash:

Hash functions are only required to produce the same result for the same input within a single execution of a program; this allows salted hashes that prevent collision denial-of-service attacks.

Fixes #2943.

I did a quick benchmark on a long string (first five paragraphs of lorem ipsum), and this took 123ns/iteration on average, compared to 117ns/iteration with std::hash, which is fairly reasonable.

Copy link

codecov bot commented Feb 26, 2024

Codecov Report

Attention: Patch coverage is 84.21053% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 93.51%. Comparing base (d2cfc65) to head (f127718).

Files Patch % Lines
src/include/function/hash/hash_functions.h 84.61% 2 Missing ⚠️
src/include/storage/index/hash_index_builder.h 66.66% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2952   +/-   ##
=======================================
  Coverage   93.51%   93.51%           
=======================================
  Files        1121     1121           
  Lines       42913    42919    +6     
=======================================
+ Hits        40131    40137    +6     
  Misses       2782     2782           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@ray6080 ray6080 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -133,31 +134,52 @@ inline void Hash::operation(
template<>
inline void Hash::operation(
const double& key, common::hash_t& result, common::ValueVector* /*keyVector*/) {
result = std::hash<double>()(key);
// 0 and -0 are not byte-equivalent, but should have the same hash
if (key == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if adding 0.0 can also solve the "0 and -0 are not byte-equivalent" problem without an if/else branch.

auto newKey = key + 0.0;
result = murmurhash64(*reinterpret_cast<const uint64_t*>(&newKey));

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does appear to also work.

@ray6080 ray6080 merged commit 87b4348 into master Feb 27, 2024
15 checks passed
@ray6080 ray6080 deleted the hash-fix branch February 27, 2024 06:04
@ray6080
Copy link
Contributor

ray6080 commented Feb 27, 2024

@benjaminwinger I merged this, so it can be in our dev build to be verified by the bug reporter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Index lookups on string primary keys fail on Linux when the database files are created on MacOS
2 participants