Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better Hashmap tests #82

Open
rurban opened this issue Nov 13, 2019 · 9 comments
Open

Better Hashmap tests #82

rurban opened this issue Nov 13, 2019 · 9 comments
Assignees

Comments

@rurban
Copy link
Owner

rurban commented Nov 13, 2019

Originally I also checked various other much faster linear probing hashmaps, such as e.g. from rigtorp or the two swisstable variants (Google and Facebook), but I settled with the slow standard (seperate chaining) for fair comparison purposes.
Maybe add other ones, and document them.

@dumblob Speaking about currently worldwide available hashmaps, definitely skim the comparison from Martin Ankerl https://martin.ankerl.com/2019/04/01/hashmap-benchmarks-05-conclusion/ to notice one very important thing. Namely that some hashmaps are almost not at all sensitive to collisions and throughput and latency (Throughput versus latency) of the hash function, but other hashmaps rely completely on the quality and throughput and/or latency of the hash function.
To be fair in showing real use case scenarios, we would actually need to test one representative from the sensitive corner and one from the insensitive corner.

@rurban True. There should be a fast, sensitive linear probing also, besides the usual insensitive std chaining. About a robin_hood I'm not sure. It's much more stable than linear probing, and from outside just a fast chaining variant.
we need to add the stddev to the mean cycles. I'll also add the initial insert/delete timings, but without trials and outlier removal.
For poor hash functions add better security measures, like collision counting or tree-conversion.
None of the usual hashmaps are secure, so maybe add fixed versions and test the overhead.

The goal should be to choose a good hash function, small (inlinable!) and not too many collisions, based on the timings, for slow <unordered_map> and fast hash tables. With string keys, not numbers.

Parts taken from the discussions at #80 and #61

@rurban rurban self-assigned this Nov 13, 2019
@wangyi-fudan
Copy link
Contributor

these results are a bit old @data-man .
https://github.com/ktprime/emhash is a good place for hashmap benchmark. I stick on ska::bytell_hash_map for speed and space.

@data-man
Copy link
Contributor

@wangyi-fudan

these results are a bit old

You can always generate new results. :)

@dumblob
Copy link

dumblob commented Nov 13, 2019

@wangyi-fudan just FYI Martin Ankerl in the link above compared emhash and commented it like this:

If you are dealing with small maps and integral data types as keys, emilib1::HashMap might be fastest. But be aware that I cannot vouch for it’s stability. It is very new, and while integrating it the author still had to fix a few kinks to get it working in my benchmark. If that does not bother you, it’s search performance is very fast.

If you want a more stable map with very good performance, I’d go for absl::flat_hash_map as the default map. It is very stable, has lots of features, and definitely well tested.

@rurban
Copy link
Owner Author

rurban commented Nov 17, 2019

I'm more leaning to https://github.com/greg7mdp/parallel-hashmap,
an improved abseil/swisstable variant, an open-addressing header-only C++ hashtable with quadratic probing. But it does it's own mixing for bad hash functions. Maybe disable that.

I'm not so sure our tested hashfuncs are actually inlined with the current HashMapTest. Definitely need to be templated with new pfhash classes, and not via std:function indirect calls.

rurban added a commit that referenced this issue Nov 18, 2019
fails for many hashes.
See GH #82
rurban added a commit that referenced this issue Nov 19, 2019
fails for many hashes.
See GH #82
rurban added a commit that referenced this issue Nov 19, 2019
fails for many hashes.
See GH #82
rurban added a commit that referenced this issue Nov 20, 2019
fails for many hashes.
See GH #82
rurban added a commit that referenced this issue Dec 4, 2019
to help inlining the hash function. See #82
rurban added a commit that referenced this issue Dec 4, 2019
to help inlining the hash function. See #82
rurban added a commit that referenced this issue Dec 6, 2019
to help inlining the hash function. See #82
rurban added a commit that referenced this issue Dec 9, 2019
to help inlining the hash function. See #82
rurban added a commit that referenced this issue Dec 13, 2019
to help inlining the hash function. See #82
rurban added a commit that referenced this issue Dec 14, 2019
to help inlining the hash function. See #82
rurban added a commit that referenced this issue Dec 15, 2019
to help inlining the hash function. See #82
rurban added a commit that referenced this issue Dec 16, 2019
to help inlining the hash function. See #82
rurban added a commit that referenced this issue Dec 17, 2019
to help inlining the hash function. See #82
rurban added a commit that referenced this issue Jan 18, 2020
to help inlining the hash function. See #82
rurban added a commit that referenced this issue Jan 20, 2020
to help inlining the hash function. See #82
rurban added a commit that referenced this issue Jan 23, 2020
to help inlining the hash function. See #82
rurban added a commit that referenced this issue Feb 7, 2020
to help inlining the hash function. See #82
rurban added a commit that referenced this issue Feb 16, 2020
to help inlining the hash function. See #82
rurban added a commit that referenced this issue Mar 7, 2020
to help inlining the hash function. See #82
rurban added a commit that referenced this issue Aug 14, 2020
to help inlining the hash function. See #82
rurban added a commit that referenced this issue Aug 14, 2020
to help inlining the hash function. See #82
rurban added a commit that referenced this issue Aug 14, 2020
to help inlining the hash function. See #82
rurban added a commit that referenced this issue Aug 16, 2020
to help inlining the hash function. See #82
rurban added a commit that referenced this issue Aug 19, 2020
to help inlining the hash function. See #82
rurban added a commit that referenced this issue Aug 23, 2020
to help inlining the hash function. See #82
rurban added a commit that referenced this issue Aug 24, 2020
to help inlining the hash function. See #82
rurban added a commit that referenced this issue Aug 24, 2020
to help inlining the hash function. See #82
@thomasahle
Copy link
Contributor

I think it would be nice to disable quadratic probing as well.
Being able to handle a pure linear probing is a classical hash function test, which probably requires a quite strong (5 independent) hash function. Multiply shift should certainly not be able to handle linear probing, but it seems to do alright in the current parallel hashtable tests.

@rurban
Copy link
Owner Author

rurban commented Sep 4, 2020

Yes. The idea was to test it with the common default hashtable, and a good modern one. Adding a trivial linear probing or even a chained one is a good idea, because so many people are still using them.

@thomasahle
Copy link
Contributor

Some would even say that if you have a good enough hash function, quadratic probing and tree-fallbacks only slow you down 😉

@rurban
Copy link
Owner Author

rurban commented Sep 5, 2020

Others like me would argue that only the poor hash functions lead to acceptable hash table speed. Good hashes are for the weak. But it depends entirely on the usecase and the hash table impl.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants