Finnish table - Extendible hashing hash table

Features

malleable insert
grows
shrinks
lookup
remove
meaningless name
bugs that will not be fixed. This is just an attempt to bully other people to do extendible hashing but better than I did. (Please don't just copy and pasta the ideas presented here or I will call you a fraud.)
it just works

The minimal perfect hashing variant of this map.

Tell me lies

It consists of a set of small hash tables and an array mapped trie. The trie looks at the hash value of a key and tells which of the small hash tables that key belongs to. This trie lookup is just an array indexing into the trie array. This additional bit of structure opens up all kinds of opportunities.

This implementation stores 16 bits of the hash value for each entry, 8 bits for "tophash" and 8 bits for "triehash". Triehash exists to speed up growing the hash table. Both parts are used to find the matching key when probing so we don't have to look at the actual key too much, tophash exists solely for that.

By storing the 8 bits of triehash, the growing mechanism of splitting can be done without having to always recalculate hashes for the keys. When splitting, half of the entries of the small hash table are moved to a new hash table and the trie is updated to point to them both. The triehash tells us which of the two hash tables each key belong to. Finnish table can double multiple times in size before having to recover the full hash value. With this implementation you get to store just above ~3k key-value pairs before your hash function is needed to hash the keys for the second time. And from then on it's every 4th split.

Growing involves data that is mostly already in the CPU caches because they are loaded in the preceding linear probing, and there's only very little of it (small hash tables are small). The amount of cache pollution that a single growth-causing Put can do is pretty much limited to 2x the size of the small hash table that became full.

Shrinks when big. Doesn't allocate or hash keys when shrinking.

Size hint / initial capacity

Unlike with the built-in Go map, the initial size hint doesn't end up being rounded up to the next power-of-two number. By utilizing the fact that only the trie needs to be of a power-of-two size but not the number of buckets per map, Finnish table will adjust the number of buckets per map to allocate much more precisely the minimal amount of space for the hinted amount of entries. Right now you get a load factor of 0.75 after inserting the hinted amount of entries.

Sweet little lies

Here Todd Howard sells you the same ~~game~~ data structure for the third time.

Look at the BenchmarkLoadFactor test to see how loaded the structure stays. For the current implementation it gracefully glides up and down from 0.5 to 0.8 for various sizes. No big jumps there between two sizes.

BenchmarkShortUse simulates the absolute classic usage of hashmaps - you create one, populate it, do some lookups and then throw it away! Here compared against the built-in map of Go. So, create empty map (with no size hint), insert size unique entries, and then look up all those size entries. Repeat that times times and there you have one "op." Key and value are both 64-bits.

fish=false&size=7&times=10000        2199945 ns/op      1920000 B/op    20000 allocs/op
fish=true&size=7&times=10000         3544430 ns/op      4000000 B/op    20000 allocs/op
fish=false&size=33&times=1000        2846731 ns/op      2422751 B/op     9047 allocs/op
fish=true&size=33&times=1000         1922391 ns/op      1922391 B/op     8747 allocs/op
fish=false&size=121&times=1000      11314968 ns/op     11016912 B/op    20937 allocs/op
fish=true&size=121&times=1000        7089865 ns/op      7922416 B/op    15904 allocs/op
fish=false&size=795&times=1000      59302975 ns/op     45040234 B/op    56524 allocs/op
fish=true&size=795&times=1000       39831838 ns/op     32084966 B/op    23752 allocs/op
fish=false&size=1092&times=1000     90666254 ns/op     86812278 B/op    66993 allocs/op
fish=true&size=1092&times=1000      58372264 ns/op     49989331 B/op    29632 allocs/op
fish=false&size=1976&times=200      34487428 ns/op     34691945 B/op    20330 allocs/op
fish=true&size=1976&times=200       21424161 ns/op     17471373 B/op     8592 allocs/op
fish=false&size=2359&times=200      37926931 ns/op     34755320 B/op    20636 allocs/op
fish=true&size=2359&times=200       23959959 ns/op     17494322 B/op    8613 allocs/op
fish=false&size=6215&times=200      92299503 ns/op     71584121 B/op    43602 allocs/op
fish=true&size=6215&times=200       64014923 ns/op     32748146 B/op    14135 allocs/op
fish=false&size=500000&times=5     432175780 ns/op    218992418 B/op    96058 allocs/op
fish=true&size=500000&times=5      304238243 ns/op     95116124 B/op    33081 allocs/op

fish=false is built-in map, fish=true is Finnish table. Please do note that my preferred cpu is Intel/Haswell.

Impact on the runtime

Extendible hashing hash table does not use one big memory allocation for the backing storage, instead it consists of many (mostly) fixed size small allocations. This may have some sort of impact on your allocator/GC/whatever_runtime_you_have. Also, when growing by splitting, little to no memory is thrown away, instead the small hash table that became full and is now being "split" is re-used as the other half.

For example below is benchmark results showcasing the difference in the amount of GC cycles and stop-the-world time between built-in Go map and Finnish table. Here one op is insert of size key-value pairs into the hash table. We let the GC run on multiple CPU cores as one does in real software, so it messes things up a bit as it usually does.

ext=false&size=263            55346 ns/op       290.0 gc-stw-ns/op    0.005799 gc/op
ext=true&size=263             49792 ns/op       158.2 gc-stw-ns/op    0.002850 gc/op
ext=false&size=723           179984 ns/op       810.7 gc-stw-ns/op    0.01176 gc/op
ext=true&size=723            127750 ns/op       492.8 gc-stw-ns/op    0.008284 gc/op
ext=false&size=1902          512626 ns/op      2852 gc-stw-ns/op      0.04754 gc/op
ext=true&size=1902           331951 ns/op      1918 gc-stw-ns/op      0.03419 gc/op
ext=false&size=6021         1535226 ns/op      7070 gc-stw-ns/op      0.09988 gc/op
ext=true&size=6021          1216736 ns/op      3991 gc-stw-ns/op      0.05848 gc/op
ext=false&size=10518        2488954 ns/op     12123 gc-stw-ns/op      0.1983 gc/op
ext=true&size=10518         2292961 ns/op      6710 gc-stw-ns/op      0.09808 gc/op
ext=false&size=39127        9071257 ns/op     53570 gc-stw-ns/op      0.9932 gc/op
ext=true&size=39127         8977597 ns/op     24805 gc-stw-ns/op      0.3718 gc/op
ext=false&size=124152      30414805 ns/op     53572 gc-stw-ns/op      1.024 gc/op
ext=true&size=124152       29231045 ns/op     81096 gc-stw-ns/op      1.360 gc/op
ext=false&size=2500000    381788105 ns/op    346223 gc-stw-ns/op      2.667 gc/op
ext=true&size=2500000     245723356 ns/op    111994 gc-stw-ns/op      2.600 gc/op

The nice (but hard to benchmark) thing about having small(ish) allocations is that when the hash table is freed then its backing memory is not one big chunk of bytes that will never fit anyone else's needs. Smallish allocations have a much larger likelihood¹ of being reused by some other part of the program. Especially if all parts of your program strive to allocate chunks of the same size.

Big sad

Plenty of cycles are being wasted in calling the functions written in asm. Compiling with "--tags nosimd" is usually faster as it just does SWAR written in Go. For any curious reader, the above benchmarks are all calling into the asm.

Silly stuff

Extendible hashing gives us a shitty opportunity to do locality-sensitive hashing so that keys that are "near" each other also end up close to each other in memory. When doing inserts/lookups/whatever in order then your CPU caches remain useful. It just about works when inserting 64-bit integers from 0 to 100 million:

insert8ByteKey8ByteValue/ext=true&size=100000000&lsh=false   13793889356 ns/op   137.9 ns/insert
insert8ByteKey8ByteValue/ext=true&size=100000000&lsh=true     6970518291 ns/op    69.7 ns/insert

and then for lookups:

lookup8ByteKey8ByteValue/ext=true&size=100000000&lsh=false   16412172661 ns/op
lookup8ByteKey8ByteValue/ext=true&size=100000000&lsh=true     9062749180 ns/op

Sources

Footnotes

This was revealed to me in a dream. Beyond my cryptic dreams I'm pretty sure that there's a paper out there that repeats this message. One of course could just look at the allocator/GC code and think for oneself from that. ↩

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

readme.md

Finnish table - Extendible hashing hash table

Features

Tell me lies

Size hint / initial capacity

Sweet little lies

Impact on the runtime

Big sad

Silly stuff

Sources

Files

readme.md

Latest commit

History

readme.md

File metadata and controls

Finnish table - Extendible hashing hash table

Features

Tell me lies

Size hint / initial capacity

Sweet little lies

Impact on the runtime

Big sad

Silly stuff

Sources

Footnotes