Robin Hood Hashing should be the default hash table implementation (2013)

Bognar · on Aug 31, 2016

The post mentions using tombstones for deletion, but I was under the impression that the variance benefits of Robin Hood hashing weren't realized unless you performed a backwards shift on the remaining elements instead of using a tombstone.

See these posts for more information:

http://codecapsule.com/2013/11/11/robin-hood-hashing/

http://codecapsule.com/2013/11/17/robin-hood-hashing-backwar...

Unfortunately there's not great information around the overall CPU performance impact of using a shift vs. tombstone.

mcguire · on Aug 31, 2016

I don't know the actual performance impact, but using backwards shifts and removing tombstones leaves the code much simpler.

https://github.com/tmmcguire/rust-toys/blob/master/pony/anag...

scythe · on Aug 31, 2016

I think you can obtain the same variance benefits if you delay the backwards shift until you insert into a tombstone bucket. In that case you backwards shift until the "new" element has a higher DIB than all of the "promoted" ones, or an empty bucket is reached, or a DIB zero bucket is reached. Of course you're not really saving much actual computation this way (you do save a little, since you avoid situations where an element is back-shifted and immediately front-shifted), just moving it around.

(I'm not sure why fast deletions are necessary in the first place?)

doomrobo · on Aug 31, 2016

A variant of this is the default for Rust's HashMap.

http://codecapsule.com/2013/11/17/robin-hood-hashing-backwar...

_nalply · on Aug 31, 2016

See: http://cglab.ca/~abeinges/blah/robinhood-part-1/ about the implementation in Rust.

Gankro · on Aug 31, 2016

This post only touches on the theory, not the implementation. I never got around to writing about the implementation as there was some pending churn at the time.

_nalply · on Sept 1, 2016

This is a pity. If you could please continue. I am hugely interested.

stcredzero · on Aug 31, 2016

Neat idea! It's using the same generalized strategy as a Splay Tree. Not exactly the same of course, but there's an analogy here. The algorithm achieves a fast amortized (and average) time by continually shortening the "search paths." (In the case of the RH Hash, it's not actually a path, but a sequence of slots.)

flatline · on Aug 31, 2016

Interesting observation - I haven't thought about splay trees for a while, but I had a similar initial thought, that this was effectively an MRU optimization. On reading further though, that's not the point of the RH hash, which is reducing variance in the probe length. The splay operation comes with a lot more overhead than a swap, and the MRU-specific benefits with RH hashing are going to be averaged out over time.

dlbucci · on Aug 31, 2016

Found the CMU grad! I think this is very different from a splay tree, though. Open-address tables take advantage of the cache by removing linked lists from the data structure, which is why I think they tend to have good real-world performance.

Splay trees, on the other hand, can only really be implemented using pointers and a linked structure (splaying a tree that's been implemented with an array sounds absolutely miserable). They don't have good cache usage, which is why they generally perform poorly outside of big-O notation.

jabl · on Aug 31, 2016

Also, splay trees turn every read into several memory writes. Maybe not a good idea if the data structure is read by multiple threads (cacheline ownership bouncing back and forth all the time, rather than each core having it's own cached copy).

stcredzero · on Sept 1, 2016

Binary Trees are a bad idea from the POV of cache use, and from your analysis, Splay Trees are much worse. However, this has nothing to do with the generalized meta-strategy of continually reducing the amount of work it takes to fetch items. It has to do with why Binary Trees are bad from the POV of cache use.

stcredzero · on Aug 31, 2016

Found the CMU grad!

Nope! Didn't go there.

I think this is very different from a splay tree, though. Open-address tables take advantage of the cache

Well, duh! That's why I said it's analogous. Of course something built on an array is likely going to have better cache behavior! My point is that you're using amortized work to shorten the average amount of work it takes to retrieve your data. My point was to talk about the generalized meta-strategy. Lose a point for poor reading comprehension because you so-badly-wanted-to-correct-someone.

AustinScript · on Aug 31, 2016

Dial it back man.

nikic · on Sept 1, 2016

One consideration to take into account when choosing your hash table implementation is how much it is susceptible to the quality of the used hash function. Hash tables using chaining are pretty robust against low-quality hash functions. Using the same with a Robin Hood table will land you in a catastrophic case very quickly.

This is the issue I ran into when investigating use of RH hashing in PHP -- the average case had good performance, but there were pathological cases exhibiting extremely high clustering in a mostly empty table. The reason was the low-quality hash (for integer keys the identity function).

jzebedee · on Aug 31, 2016

I was comparing hash performance for a hobby project when I first discovered Robin Hood hashing. I ended up making a demo implementation of the backwards-shifting variant in C# [1] and found that it wasn't the best performer for the load case I needed. I still think it's an incredibly clever algorithm and I would love to find a project where I could use it again.

[1] https://github.com/jzebedee/rhbackshiftdict

Bartweiss · on Aug 31, 2016

I'm curious - what was your deletion strategy?

The CodingCapsule links in the comments here suggest that tombstone deletion actually breaks the low-variance promise from the original paper, and only back shift deletion produces the theoretical performance.

doomrobo · on Aug 31, 2016

Out of curiosity what did your load look like? What was the best performer?

jabl · on Aug 31, 2016

Another slightly fancier than the usual textbook approaches is cuckoo hashing. Has anyone done a comparison with cuckoo vs. robinhood?

pkhuong · on Aug 31, 2016

Lookups in cuckoo tables hit two uncorrelated ("random") locations in the hash table. That's not a problem for hardware implementations that can just read from two different SRAMs. On commodity hardware with virtual memory, it depends. If the table fits in your TLBs, you're probably OK… but at such small sizes, the complexity of Cuckoo hashing and its sensitivity to bad hash functions might not be ideal.

If, however, both accesses are expected to incur TLB misses, you're very likely screwed: handling the misses takes a lot of resources and the vast majority of (commodity) CPUs can only handle one TLB miss at a time. After hashing, the time for lookups is now dominated by two TLB misses; compare to one TLB miss with linear probing. You can also play the odds and only do the second lookup conditionally; that's still ~1.5 random accesses on average (versus 1).

Cuckoo hashing can get you 95-99% occupancy versus 80-90% with simpler schemes. Unless you have specialised size or cost requirements, it seems to me the extra 10-20% space usage is a better choice than double the random accesses.

nkurz · on Sept 1, 2016

I lost you in the last paragraph. Are you maybe missing a word, or saying cuckoo where you mean Robin Hood?

Separately, aren't TLB misses pretty fast compared to hitting RAM? I'd think they would only dominate if the TLB itself it too large for cache, which I don't think is common. And if you are using so much memory that this is happening, moving to GB Hugepages would solve it

The "sensitivity to a bad hash function" seems like an odd weakness. For a given quality of hash, wouldn't cuckoo tend to be less susceptible than any single hash approach?

I ask because I'm currently bullish on d-ary cuckoo hashes, and think they'd be a good fit for the fast gather on Skylake.

pkhuong · on Sept 1, 2016

My last paragraph says you're probably better off with a solution that doesn't achieve the same density but avoids the uncorrelated accesses. In other words, stick to some form of local probing and take the hit in density.

Why would TLB misses be small compared to RAM latency? A TLB miss must be serviced by reading more memory, and misses aren't handled in parallel, unlike random reads to RAM. Sure, you could use very large (1G) pages, but that's a pretty specialised setup that's not available on every platform and tends to require a reboot to enable/tweak. Not something we want to rely on in general.

Cuckoo is particularly sensitive to bad hash functions: if a few elements always hash to the same pair of values, you're screwed. That's particularly problematic with the usual interfaces that don't let the hash table specify a seed to the hash function and expect a machine word: we have to map values to hashes, and remix or split the hashes (in theory, that's defensible as long as the intermediate hash is strong and its codomain is at least (hash set size)^2), but there's nothing to remix away if we have too many values that map to the same intermediate hash. If the hash table specifies calls the hash function with two different seeds, that means double the time spent in hashing, and that overhead can cover for a lot of linear probing.

Simple deterministic probing technique just take a hit with a bigger cluster than expected; a performance degradation, but the table still work. You can also see theoretical analyses that'll lead you to similar conclusions if you look at the k-independence needed for each hashing technique.

Finally, I don't think gathers are faster than independent memory accesses unless everything is in the same cache line. You don't need SIMD instructions to expose memory level parallelism, you just need independent dependency graphs in your serial computation (until you're bottlenecked on an execution resource that's not duplicated and rarely pipelined, like the TLB miss logic).

nkurz · on Sept 1, 2016

My last paragraph says you're probably better off with a solution that doesn't achieve the same density but avoids the uncorrelated accesses. In other words, stick to some form of local probing and take the hit in density.

Thanks, I was misreading.

Why would TLB misses be small compared to RAM latency?

Because for recent CPU's (post-P5 for Intel) the page walks to service a TLB miss use the standard data caching mechanisms, thus for a frequently used hash table that is reading only a couple cachelines per lookup, the page tables usually remain in cache: http://electronics.stackexchange.com/a/67985.

So while the TLB miss requires a lookup, this lookup frequently doesn't require hitting RAM. My recollection is that this means a TLB miss usually costs only the relevant cache miss plus ~10 cycles. But this does require certain assumptions about the access pattern, and I've been meaning to retest this on recent hardware to be sure.

misses aren't handled in parallel

Based on your earlier phrasing you probably realize, but in case others don't, since Broadwell Intel CPU's do handle two page walks in parallel: http://www.anandtech.com/show/8355/intel-broadwell-architect....

Cuckoo is particularly sensitive to bad hash functions: if a few elements always hash to the same pair of values, you're screwed.

Yes, although if you can choose a good hash function this should be rare. And there are variations of cuckoo hashes that are much less susceptible to this. The first either increases the number of hashes (d-ary), and the second adds multiple "bins" as described by 'cmurphycode' in another comment. Then you can add a "failsafe" by adding a "stash" of last resort: https://www.eecs.harvard.edu/~michaelm/postscripts/esa2008fu...

If the hash table specifies calls the hash function with two different seeds, that means double the time spent in hashing, and that overhead can cover for a lot of linear probing.

If you can choose your own hash function, the hashing cost should be minimal even for a "perfect" hash. And a SIMD approach usually means that you can create 2, 4, or 8 hashes using different seeds in exactly the same time that you can create a single hash: http://xoroshiro.di.unimi.it/xoroshiro128plus.c

Finally, I don't think gathers are faster than independent memory accesses unless everything is in the same cache line.

They weren't any faster until Skylake, but they are significantly faster now: https://github.com/lemire/dictionary

doomrobo · on Aug 31, 2016

They're pretty similar in philosophy. I'd guess that Robin Hood would perform better on average just by virtue of linear probing. If you get a collision in a Cukoo hash, you're pretty much guaranteed a cache miss. With a Robin Hood hash, it's pretty likely that the element you're looking for is in the same cacheline as the collision.

erichocean · on Aug 31, 2016

> If you get a collision in a Cukoo hash, you're pretty much guaranteed a cache miss.

Don't people usually use buckets with cuckoo hashing, so that each location houses 2 or more values (one right after the other)? I guess I'm not seeing how that's fundamentally different from linear probing...

pbasista · on Aug 31, 2016

Why would they do that?

The cuckoo hashing is just a collision resolution technique used in open addressing hashing schemes. Having two possible hashes of a key always next to each other (that is what you suggest by using buckets if I understand correctly) would cause the cuckoo algorithm for kicking out the conflicting items to stop working.

If both possible locations for a key are always in the same bucket, the two keys with a colliding hash would also share the alternative hash (i.e. it would be the other place in the same bucket). And so, they would only be able to kick each other out. A loop of length 2. Always. If three keys hash into the same bucket, you will have to rehash the entire table.

You can have larger buckets, but the problem remains. Once a big enough number of keys hash into the same bucket, you will have to rehash the table. That would most probably mean low typical load factors.

Of course, what you will be getting in return is the locality of memory references. But is that a reasonable trade-off? I think there are better ways to achieve locality of memory references...

cmurphycode · on Aug 31, 2016

Forgive me, but I think you've misunderstood something here. The most important thing about the cuckoo is that your bucket choice functions (i.e. hashing function) are independent. If two keys hash to the same first choice bucket, their alternate hashes will (almost certainly) hash to DIFFERENT buckets. For instance, take the SHA-1 of the key. Your first bucket choice is defined by the first x bits, your second bucket choice is defined by the next x bits. But inside a bucket, you can have a bunch of slots.

It works like this. A typical sizing is 4 slots in a bucket, with 2 bucket choices. So your first choice, you arrive at a bucket, and it is just a linear array of length 4. You can put your element anywhere! After a while, when you go to insert into your first choice and you find all 4 slots taken, then you can take ANY of those and try to put it in its alternate bucket. If that alternate bucket is full, you evict from that bucket, too! Thanks to the random nature of your hash functions, it works out very well. 2 bucket 4 slot gets you near 95% occupancy without many evictions on insert.

When you go to lookup, you check your first bucket, and you check all 4 elements until you find it. If you don't find it, you go to your alternate bucket and check all 4 elements. But that's it - only two random accesses (the 4 slots will easily fit in a cache line). It's not so bad to check all 4 elements because the cost is really dominated by loading the line to begin with.

I really recommend a read of some of the cuckoo papers, it's cool stuff.

pbasista · on Sept 1, 2016

I am aware that using buckets the way you have described works fine for the cuckoo hashing collision resolution technique. But it would not help to ensure the same kind of locality of memory references as you would get using the linear probing.

I do not know what exactly the commenter meant by suggesting to use buckets. I assumed (as the conversation is predominantly about the locality of memory references) that they suggested to use buckets in order to avoid two (or more) cache misses. And I just pointed out that it would be a bad trade-off, because then the cuckoo algorithm would only operate within a bucket and the expected load factors would be low.

nkurz · on Sept 1, 2016

And I just pointed out that it would be a bad trade-off, because then the cuckoo algorithm would only operate within a bucket and the expected load factors would be low.

Like 'cmurphycode', I think you might be misinterpreting something about how the buckets would be used. The buckets are in addition to the multiple hashes (which I think some authors call 'ways'). A good number of buckets would those sufficient to fill 1 or 2 cachelines, since these are going to already be fetched from memory as a unit.

The idea is that you fetch the full buckets from 2 (or more) independent hashes, and then sift through them linearly without further memory accesses. The terminology for cuckoo hashes is poor, but I think "buckets", "bins", and "slots" are all synonyms depending on whose description you are using. By "using buckets", the idea is to have (at least) two regions in consecutive memory where the item will be if it exists.

I think this paper offers a reasonable terminology: http://www.cs.princeton.edu/~xl/Publications_files/cuckoo-eu.... They use a "num_hashes, num_buckets" pair to describe variations. The classic version is a "2,1-cuckoo": you create two hashes from the key, and look in one bucket in each location. Li et al. suggest that a "2,4-cuckoo" works allows much easier insertions at high density.

pbasista · on Sept 2, 2016

> The idea is that you fetch the full buckets from 2 (or more) independent hashes, and then sift through them linearly without further memory accesses

Maybe. How do you know what the commenter (erichocean) meant?

If they say that using "buckets" can mean the same locality of memory references as linear probing, their interpretation of what "buckets" mean must deal with that problem in some way.

nkurz · on Sept 2, 2016

> How do you know what the commenter (erichocean) meant?

Because "bucket" is a well-defined term-of-the-art with regard to cuckoo hashes, because Erich has demonstrated deep knowledge of similar algorithms in the past, because I've been thinking about improving the memory locality of cuckoo hashes for years, and because I would have used approximately the same phrasing as Erich did to describe the memory locality advantages of "buckets".

If they say that using "buckets" can mean the same locality of memory references as linear probing, their interpretation of what "buckets" mean must deal with that problem in some way.

Not only do buckets "deal" with locality, because of the granular nature of cachelines and parallel abilities of memory controllers, searching two compact small regions turns out to be just as searching one region of twice the size. Vectorized linear probing and SIMD cuckoo hashes are so similar that your question very likely indicates that you are not using the same terminology as others are.

Here's paper that offers a good comparison:

  Rethinking SIMD Vectorization for In-Memory Databases
  Polychroniou, Roghaven, and Ross (2015)

http://www.cs.columbia.edu/~orestis/sigmod15.pdf

doomrobo · on Aug 31, 2016

It's the same problem though. As the other comment said, you have 2 uncorrelated memory locations. If you check the first one and your element isn't there, then checking the second one will probably incur a cache miss.

The difference between linear probing and buckets is a big one: buckets are linked lists. That means that hitting a bucket doesn't pull everything else inside the bucket into the CPU cache. So iterating through the bucket may incur a cache miss for every element. On the other hand, linear probing in open addressing does away with buckets and instead searches the neighbors of the collision. Because they're nearby in memory, it's a lot quicker to search.

pkhuong · on Aug 31, 2016

You can have preallocated buckets in an array. What's the difference you ask? When you have more elements per buckets, the variance becomes smaller as a fraction of the bucket's size. Imagine you had a hash table with buckets of 1000 elements. You'd probably feel comfortable with a 90% load even without any strategy to handle bucket overflow except growing and rehashing. Obviously, the downside is that we have to do more work to search in buckets.

jlebar · on Sept 1, 2016

The overhead of storing hashed values inside the hashtable is not to be neglected.

If my key type has a fast hash function, maybe I don't need to store the hashed value. But if e.g. I have a hashset of char*s, I definitely don't want to rehash. Storing those hashes alongside the pointers effectively halves my load factor, from a cache-friendliness point of view. There goes the whole load-factor advantage.

Maybe Robin Hood hashing still wins by letting me use linear probing, which is more cache-friendly than (say) the quadratic probing you'd normally have to do. But this is getting sketchier...

In addition, it's not clear to me that the cost of the swaps is entirely negligible in the common case when the elements of the hashtable are more than one or two words wide. Certainly not if you have to run a constructor to do the move instead of just memcpy'ing the elements.

I like the idea, but it is not (to me) obviously a knockout win.

pkhuong · on Sept 1, 2016

If you don't need to store the hash value because your hash function is simple, you don't… Nothing about the (linear) robin hood collision handling strategy forces an explicit representation of the hash value. However, it is true that C++ makes some data structures slower than they should be.

spike021 · on Aug 31, 2016

My data structures and algorithms course was unfortunately taught by a pretty awful professor.

But out of the entire course, he did have us implement a hash table using Robin Hood Hashing for a homework assignment once. It's the one part of the course that I enjoyed outside of my issues with it, and it's the one that, a couple years down the road, I still like to think and talk to coworkers about because it's an interesting topic.

raymondh · on Aug 31, 2016

How is this different from "Brent's variation of Knuth's Algorithm D"?

* https://books.google.com/books?id=e3wLBAAAQBAJ&pg=PA532#v=on...

* http://maths-people.anu.edu.au/~brent/pd/rpb013.pdf

jaswilder · on Aug 31, 2016

The author's thesis has some comparisons to Brent's method. See P51-53 https://cs.uwaterloo.ca/research/tr/1986/CS-86-14.pdf for conclusions.

kmill · on Aug 31, 2016

If I remember correctly, Brent's variation considers both the chain of the element being inserted, and the chains for the elements being swapped, and then does a dynamic program to decide exactly what to swap. It sacrifices some insertion efficiency for fewer expected probes for lookups.

qwertyuiop924 · on Aug 31, 2016

RHH is neat, and useful, but I don't know about making it your go-to hash implementation: If you're writing the code yourself, and don't have to worry about the cost of the occaisional cache miss, the simplicity of List or B-Tree chaining is a pretty big win.

HoweverIf you're going to implement a general-purpose hashtable, or are in an environment where you really cannot afford that cache miss, RHH or similar seems like the right way to go.

xedarius · on Sept 1, 2016

He also wrote a follow up article addressing performance issues surrounding probe size increasing when erasing elements.

http://www.sebastiansylvan.com/post/more-on-robin-hood-hashi...

diziet · on Aug 31, 2016

There is also a great resource with more up do date stuff on hashing and benchmarks right here: https://github.com/rurban/smhasher/

erichocean · on Aug 31, 2016

That compares hash functions, not hash tables (aka dictionaries, maps, key-value stores, etc.).

Hash functions convert sparse keys to dense keys (with potential for collisions). Hash tables look up and store values by key using their hash (usually in-memory only, although this is not a requirement per se).

Gankro · on Aug 31, 2016

It's also a terrible comparison of hash functions for the purpose of hashmaps, because it tries to reduce the performance characteristics to a single number. The performance characteristics of a hash function varies over the size of the input. FarmHash is super complicated because it's basically a huge stable of hash functions that are selected based on input size.

For instance, looking at these numbers you might conclude XXHash is strictly faster than FNV, but in reality FNV performs much better for the more common case of small inputs (breaking even around 16-32 bytes).

diziet · on Sept 1, 2016

I disagree, as this simple test suite is not meant to be a definitive guide on which hash function is the best.

I would venture to say that your criticism is similar to saying that the Hutter Prize[1] is a terrible idea for someone who will need to do compression on a dataset as they will choose to use PAQ and it will run really slowly. Yes, there are trade-offs and optimizations for different use cases, but to say that it's a terrible comparison is unfair.

1) http://prize.hutter1.net/

diziet · on Sept 1, 2016

I understand there are difference between hash functions and hash tables* (and between implementations of hash maps, dictionaries, etc). I do not understand the need to down-vote the comment that I think is relevant to the discussion of hash functions as implementing a hash table requires choosing a hash function along with choosing strategies for collision avoidance, collision handling, resizing, chaining, etc.

nabla9 · on Aug 31, 2016

It's no surprise that SipHash from DJB is so commonly used as default hash function for hash tables.

It's resistant against hash-flooding attacks and really fast.

rurban · on Sept 12, 2016

It's not. It's not resistant against hash-flooding attacks, and it's the slowest hash function which is used in the wild. It has some good hash function qualities, but none of its qualities would qualify it as a good hash table function. the authors only compare the speed against secure hash functions like MD5 and SHA*, but then are talking about hash tables and compare it against FNV, Murmur and City. (City was replaced by Farm btw. some years ago already)

Below some guys quote my smhasher README, where I tried to get some common sense out.

There are secure hash functions and there are secure hash tables. The topic we are talking here, starting with Robin Hood. But those two topics are mostly orthogonal. A 32-64bit hash function can never be called secure, it's trivial to brute force it. A secure hash function is not usable in a hash table with 32bit, for which 7-15bits are needed for attacks. Get over it and use proper hash tables, and esp. not linked lists. We are not in the eighties anymore.

kbd · on Aug 31, 2016

From the SMhasher page:

> The hash table attacks described in SipHash... cannot not be the problem of the hash function, but the hash table collision resolution scheme. You can attack every single hash function, even the best and most secure if you detect the seed, e.g. from the sort-order, so you need to protect your collision handling scheme from the worst-case O(n)...

> I.e. the usage of siphash for their hash table in Python 3.4, ruby, rust, systemd, OpenDNS, Haskell and OpenBSD is pure security theatre. siphash is not secure enough for security purposes and not fast enough for general usage. ... Provable secure is only uniform hashing, i.e. 2-5 independent Mult or Tabulation, or using a guaranteed logarithmic or linear collision scheme, such as Robin Hood or Cockoo hashing.

nabla9 · on Sept 1, 2016

>if you detect the seed, e.g. from the sort-order,

Good luck with detecting 128-bit secret key from sort order.

>State-recovery.

>A simple strategy to attack SipHash is to choose three input strings identical except for their last word, query for their respective SipHash outputs, and then \guess" the state that produced the output v1 ⊕ v2 ⊕ v3 ⊕ v4 for one of the two strings. The attacker checks the 192-bit guessed value against the two other strings, and eventually recovers the key. On average d2^191 evaluations of SipRound are computed.

>Internal collisions.

>As for any MAC with 256-bit internal state, internal collisions can be exploited to forge valid tags with complexity of the order of 2^128 queries to SipHash. The padding of the message length forces attackers to search for collisions at the same position modulo 256 bytes.

https://131002.net/siphash/siphash.pdf

rurban · on Sept 12, 2016

You don't need luck, you just need enough timing data on the collisions and ordering and then use a solver iteratively to produce testcases to get down to the seed. With normal sized hash tables and measurable collision timings you work with 10 bits. It's a matter of a few hours CPU time. Exploiting public JSON interfaces e.g sounds easiest.

My claim stands that SipHash for hash tables is pure snake oil. Use a proper hash table instead, esp. when they are faster also. Linked lists are 1980ies technology. Even djb can make mistakes. But to his rescue his old solution to this problem is still one of the best. I would rather blame the other guy who came up with these questionable recommendations, and esp. the complang guys for trusting snake oil.

Manishearth · on Sept 1, 2016

I think that comment is a bit extreme.

> You can attack every single hash function, even the best and most secure if you detect the seed, e.g. from the sort-order, so you need to protect your collision handling scheme from the worst-case O(n), i.e. separate chaining with linked lists

or protect the seed from being revealed. Timing attacks can be used to reveal the seed in some situations, but not all.

It also says "for security purposes", which is pretty vague. It's certainly not secure enough for a security-related application. But for most use cases it's fine when you want to just have some protection against map DOSes. Not being useful for security-related applications doesn't make it security theater.

It also says that it's useless in Rust (among others), and then goes on to say that robin hood hashing is a solution. Rust has had robin hood hashing for years.

rurban · on Sept 12, 2016

I wrote that. It is not extreme.

"Security purposes" is mostly attacking routers, dns daemons or kernels via hash flooding which don't have that bad hash tables as the mentioned languages. Interestingly those devs are snake oil resistant.

Security theatre is the claim by siphash and its proponents. The siphash paper analyzes a very small part of hash tables usage, and then wildly exaggerates its logic. People are reading it as the claims in the section before "7 Application: defense against hash flooding" would affect hash tables also. No, they are talking purely about the hash function there. But then in section 7 they act as like all hash tables are only implemented as the simpliest and worst of all, linked lists chaining. Only the worst hash tables are using this collision resolution scheme. It is slow and it is "insecure" by default. There is no need to add a pseudo-secure layer of siphash on top of it as it does not help. The idea how such a hash table would be attacked are from 2003. We are now in 2016, and people are using efficient sat solvers (first just z3 in python, then producing scripts which produce CNF, and then eventually cbmc which can use the siphash code verbatim), not the simple ad hoc attempts they are describing. Their analysis is partially sound, but the result and claim is completely wrong. We are just lucky that such hash flooding is uncool, as attacking an application service is nowadays much easier. You can read about it here almost every other day.

Rust uses a uselessly slow hash function while it was already protected by the balancing characteristics of robin hood. Therefore switching to siphash is doing more harm than good. It's using a 10-100x slower hash function which is completely useless. Count the average collisions before and after, and compare the time won there against the time lost in the hash function. It's a dramatic loss.

It wasn't a dramatic loss in python, no idea why. I guess they have some other grave mistake in their hash table. It is a dramatic loss in perl5. Even if their hash table is the worst of all open source projects. Well, at least they randomize the iterator now. Before they used a zero-invariant function (trivial to attack by adding \0), and didn't count collisions when fighting collision attacks.