LongNeighbors #45

injinj · 2019-03-17T08:50:55Z

@hmakholm has a smhasher test called LongNeighbors. This test creates similar strings by flipping bits and looking for collisions. I ran Meow 128 through this test and it fails that.

A description of this test is LongNeighborTest

My driver for Meow in the smhasher test is here smhasher, specifically, this MeowTest.cpp

I don't think there is a bug there, but maybe... I was able to modify the Meow algo to pass this test, but I don't think it would be acceptable: doubling the number of AES rounds. This causes the throughput to be 50% of the current version (smhasher reports 24MiB vs 48MiB unmodified).

It's easy to recreate this test under Linux:

$ git clone https://github.com/injinj/smhasher
$ cd smhasher
$ cmake src
$ make
$  ./SMHasher Meow128 -o                                                                                                   
-------------------------------------------------------------------------------
--- Testing Meow128 (Meow 128-bit)

[[[ Keyset 'TestLongNeighbors' Tests ]]]

Looking for 2-bit collisions in the last 2400 bits.
Looking for 4-bit collisions in the last 2048 bits.
Looking for 6-bit collisions in the last 160 bits.
Trying bases of length 10 to 300, 5 of each length.
......[16]...............[32].....
This horrible collision has surprise score 4.14258e+25 and length 38 bytes:
000: B3 C1 BA 57 71 18 B2 6B AE C8 3A C1 A9 04 C4 E3 3B F6 F0 62 84 1F 78 99
                                                          ^D2^     ^1B^
018: BD CF 10 8E 9A 73 32 4F 29 17 84 B8 86 33
    ^B5^                 ^4D^     ^8C^
The hashes are length 16 bytes:
000: B9 4B AF 11 1D DD 42 F0 15 78 BA CA 97 06 7E F8
....
This horrible collision has surprise score 4.37134e+25 and length 42 bytes:
000: 99 AD 12 8C FE E7 7D 2E 17 89 47 27 AC C6 75 95 FE CF 12 8C 95 8B 91 4D
                                                                      ^93^
018: 01 BC 85 56 D9 50 FE 6D A5 0F 89 77 E2 EF A5 B6 87 E7
       ^3C^     ^DB^                 ^F7^     ^85^     ^67^
The hashes are length 16 bytes:
000: 42 4A 96 AC 8A A1 E9 FF 44 12 80 C6 20 82 CD DA
.
This horrible collision has surprise score 4.61799e+25 and length 43 bytes:
000: 90 A5 12 8C BF E5 7F 2E 57 CB 05 27 E4 CE 35 95 F7 C7 12 8C D7 99 91 5D
018: 41 BE C7 54 D3 18 BE ED AC 47 8B 75 EB 37 07 A4 9C 64 B8
    ^C1^                 ^6F^                       ^DC^
The hashes are length 16 bytes:
000: 9B E8 1A 94 69 45 EB BC EF 73 19 4F C9 38 A6 10
...
This horrible collision has surprise score 6.49333e+25 and length 46 bytes:
000: BD 8D 12 8C FB EE 75 2E 17 80 4E 26 8C E7 75 94 DA EF 12 8C 95 CA 91 0D
018: 08 BD 8D 5F F2 78 FF 64 88 27 88 3F 8E 8F CE FC E2 8A 99 CE 2B E3
                ^D2^                 ^7F^              ^18^     ^0B^
The hashes are length 16 bytes:
000: 35 C1 E6 5B 6B 66 C7 28 52 CB 9D 10 E1 87 AD A6
..[48]...............[64].

Among 285 tested bases, we expected 2.30799e-25 bads. Actually there were 5.
*********FAIL*********

erthink · 2019-03-17T09:31:18Z

This is expected, roughly as I predicted in the comments (1, 2).

injinj · 2019-03-17T10:09:16Z

This is expected, roughly as I predicted in the comments (1, 2).

Ah yes, nice analysis.

cmuratori · 2019-03-28T03:21:07Z

We have addressed this in v0.5, which we will post as soon as we've finalized a few last things. Many thanks for the Long Neighbors test - I run your smhasher branch now as well as the other ones to make sure we pass them all :)

I will close this issue once v0.5 is posted and you can verify independently that we have no issues on your smhasher branch.

- Casey

injinj · 2019-03-31T21:26:53Z

Please note, the author of Long Neighbors to @hmakholm. I'm just a chimp with a keyboard. It is a cool way of attacking hash functions.

svpv · 2019-04-27T23:32:52Z

@leo-yuriev t1ha2_atonce128 also fails the LongNeighbors test, and has been disqualified by @rurban. Is there a way to fix it?

rurban · 2019-04-28T04:45:48Z

I would not call it "disqualified". I mostly use poor hash functions, when I know that the quality does not matter much. It just fails a test. This test came after the hash, so you can expect old hash functions to fail it. Newer hash functions don't.

svpv · 2019-04-28T10:12:45Z

@leo-yuriev @rurban I noticed that the test doesn't fail with a non-zero seed (I simply added seed++ in t1ha2_atonce128_test; I then tried a few other non-zero seeds). So what does that mean? Are there still simple patterns in the input which result in collision, it's just harder to find them with a non-zero seed? Or does the seed make the patterns go away completely?

hmakholm · 2019-04-29T15:37:53Z

@svpv -- among the kinds of inputs the LongNeighbors test tries to find collisions between are long sequences of zero bytes with just a few bits set near the end. Some hashes have particular problems with inputs of that shape. If a hash has this problem and it treats its seed more or less in the same way as the input, it is quite plausible that the problem is only detected for nonzero seeds.

Once the test has shown that it is possible for the hash to get into a state where a few bits of difference in the remaining input can yield a 128-bit collision, it's prudent to assume such a state can also be reached with a nonzero seed -- but the input might need to be just right to achieve that, so the test probably won't chance upon it. The mere existence of such a state should lead to skepticism about the hash for purposes where extreme rarity of random collisions matters.

cmuratori · 2019-05-08T06:00:43Z

(While we are on the subject of LongNeighbors, I am finalizing Meow v0.5 right now and it is a bit haphazard because there are so many SMHashers. I currently test on 4 different SMHashers just to make sure there's nothing we don't pass cleanly, but it is somewhat cumbersome especially because not all of them report failures in a way that you can read at the end, so you have to scroll through the entire output sometimes to make sure nothing failed. Is there maybe a plan to someday unify the SMHashers back into one single one that we can count on to have all the latest sauce, and which clearly prints any suspicious things at the end? I ask here only because two SMHasher fork people are already on this thread - @injinj and @rurban.

In general, it isn't the end of the world if it stays the way it is, because we really only need it at the very end when we're pretty sure the hash function is solid, so it's rare that anything fails. But even so, it would be nice to have a more fire-and-forget way of validating through all SMHasher variants).

- Casey

rurban · 2019-05-08T08:41:23Z

LongNeighbors still has a licensing problem, thus it cannot yet be published. so I can have it only in a branch. see rurban/smhasher#58

@injinj's gnuplot additions and yves https://github.com/demerphq/smhasher/ similar svg and TAP additions are also cool, besides the support for extra seed size in the structs. I have to add extra init code for such hashes so far.

injinj · 2019-05-08T13:36:32Z

I am using my smhasher fork for examining the hash algos that interest me. It's too much work to what @rurban smhasher is doing without having folks do pull requests. And then when you have that many hashes, you are essentially maintaining abandoned hashing projects except for the few that evolve with new CPU features (like SHA instructions), etc.

cmuratori · 2019-05-18T22:16:35Z

As of the v0.5 branch which is now uploaded, we pass LongNeighbors so I'm closing this issue. LongNeighbors at 128-bit and 64-bit are part of the standard testing I do before we update, too, so hopefully there will be no regressions that slip through on that, either.

- Casey

injinj · 2019-05-19T16:52:32Z

I forked the 0.4 branch and updated it to use 2x AESDEC rounds here: https://github.com/injinj/meow_hash This also passes LongNeighbors and is faster for small keys, but slower than v0.5 in the bulk speed test. I am going to use this version for other things.

cmuratori · 2019-05-19T18:20:34Z

v0.5 is trying to do a lot more than pass LongNeighbors. It tries to prevent seed discovery, and allows full 128-bit seeding. So it probably won't ever get to the point where it is as fast as v0.4 with double AESDEC for very small keys, but it's not necessarily a good idea to use v0.4 with double AESDEC because I don't know that anyone has checked to see if that is secure. I suspect it probably isn't, but I haven't looked.

- Casey

injinj · 2019-05-19T18:34:38Z

I added your comments to the fork README. I'll still probably use it, the speed is fantastic with hyperthreading disabled, nearly as fast as the crc_c instruction.

NoHatCoder · 2019-05-19T19:08:39Z

Quick take: If you change it to xor the "Mixer" into the 4 initial states you probably have a decent level 2 hash, I currently have a probability 1 in 2^80 collision, but of course worse may be possible.

injinj · 2019-05-20T18:04:47Z

Quick take: If you change it to xor the "Mixer" into the 4 initial states you probably have a decent level 2 hash, I currently have a probability 1 in 2^80 collision, but of course worse may be possible.

Done: https://github.com/injinj/meow_hash/blob/17b49a0bf05a440943fc29c41f5bee5ad14363a6/meow_hash.h#L173-L183

It requires length to be known at MeowBegin(), but doesn't change performance or smhasher pass.

cmuratori closed this as completed May 18, 2019

cmuratori reopened this May 19, 2019

cmuratori closed this as completed May 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LongNeighbors #45

LongNeighbors #45

injinj commented Mar 17, 2019

erthink commented Mar 17, 2019

injinj commented Mar 17, 2019

cmuratori commented Mar 28, 2019

injinj commented Mar 31, 2019

svpv commented Apr 27, 2019

rurban commented Apr 28, 2019

svpv commented Apr 28, 2019

hmakholm commented Apr 29, 2019

cmuratori commented May 8, 2019

rurban commented May 8, 2019

injinj commented May 8, 2019

cmuratori commented May 18, 2019

injinj commented May 19, 2019

cmuratori commented May 19, 2019

injinj commented May 19, 2019

NoHatCoder commented May 19, 2019

injinj commented May 20, 2019

LongNeighbors #45

LongNeighbors #45

Comments

injinj commented Mar 17, 2019

erthink commented Mar 17, 2019

injinj commented Mar 17, 2019

cmuratori commented Mar 28, 2019

injinj commented Mar 31, 2019

svpv commented Apr 27, 2019

rurban commented Apr 28, 2019

svpv commented Apr 28, 2019

hmakholm commented Apr 29, 2019

cmuratori commented May 8, 2019

rurban commented May 8, 2019

injinj commented May 8, 2019

cmuratori commented May 18, 2019

injinj commented May 19, 2019

cmuratori commented May 19, 2019

injinj commented May 19, 2019

NoHatCoder commented May 19, 2019

injinj commented May 20, 2019