Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LongNeighbors #45

Closed
injinj opened this issue Mar 17, 2019 · 17 comments
Closed

LongNeighbors #45

injinj opened this issue Mar 17, 2019 · 17 comments

Comments

@injinj
Copy link

injinj commented Mar 17, 2019

@hmakholm has a smhasher test called LongNeighbors. This test creates similar strings by flipping bits and looking for collisions. I ran Meow 128 through this test and it fails that.

A description of this test is LongNeighborTest

My driver for Meow in the smhasher test is here smhasher, specifically, this MeowTest.cpp

I don't think there is a bug there, but maybe... I was able to modify the Meow algo to pass this test, but I don't think it would be acceptable: doubling the number of AES rounds. This causes the throughput to be 50% of the current version (smhasher reports 24MiB vs 48MiB unmodified).

It's easy to recreate this test under Linux:

$ git clone https://github.com/injinj/smhasher
$ cd smhasher
$ cmake src
$ make
$  ./SMHasher Meow128 -o                                                                                                   
-------------------------------------------------------------------------------
--- Testing Meow128 (Meow 128-bit)

[[[ Keyset 'TestLongNeighbors' Tests ]]]

Looking for 2-bit collisions in the last 2400 bits.
Looking for 4-bit collisions in the last 2048 bits.
Looking for 6-bit collisions in the last 160 bits.
Trying bases of length 10 to 300, 5 of each length.
......[16]...............[32].....
This horrible collision has surprise score 4.14258e+25 and length 38 bytes:
000: B3 C1 BA 57 71 18 B2 6B AE C8 3A C1 A9 04 C4 E3 3B F6 F0 62 84 1F 78 99
                                                          ^D2^     ^1B^
018: BD CF 10 8E 9A 73 32 4F 29 17 84 B8 86 33
    ^B5^                 ^4D^     ^8C^
The hashes are length 16 bytes:
000: B9 4B AF 11 1D DD 42 F0 15 78 BA CA 97 06 7E F8
....
This horrible collision has surprise score 4.37134e+25 and length 42 bytes:
000: 99 AD 12 8C FE E7 7D 2E 17 89 47 27 AC C6 75 95 FE CF 12 8C 95 8B 91 4D
                                                                      ^93^
018: 01 BC 85 56 D9 50 FE 6D A5 0F 89 77 E2 EF A5 B6 87 E7
       ^3C^     ^DB^                 ^F7^     ^85^     ^67^
The hashes are length 16 bytes:
000: 42 4A 96 AC 8A A1 E9 FF 44 12 80 C6 20 82 CD DA
.
This horrible collision has surprise score 4.61799e+25 and length 43 bytes:
000: 90 A5 12 8C BF E5 7F 2E 57 CB 05 27 E4 CE 35 95 F7 C7 12 8C D7 99 91 5D
018: 41 BE C7 54 D3 18 BE ED AC 47 8B 75 EB 37 07 A4 9C 64 B8
    ^C1^                 ^6F^                       ^DC^
The hashes are length 16 bytes:
000: 9B E8 1A 94 69 45 EB BC EF 73 19 4F C9 38 A6 10
...
This horrible collision has surprise score 6.49333e+25 and length 46 bytes:
000: BD 8D 12 8C FB EE 75 2E 17 80 4E 26 8C E7 75 94 DA EF 12 8C 95 CA 91 0D
018: 08 BD 8D 5F F2 78 FF 64 88 27 88 3F 8E 8F CE FC E2 8A 99 CE 2B E3
                ^D2^                 ^7F^              ^18^     ^0B^
The hashes are length 16 bytes:
000: 35 C1 E6 5B 6B 66 C7 28 52 CB 9D 10 E1 87 AD A6
..[48]...............[64].

Among 285 tested bases, we expected 2.30799e-25 bads. Actually there were 5.
*********FAIL*********
@erthink
Copy link

erthink commented Mar 17, 2019

This is expected, roughly as I predicted in the comments (1, 2).

@injinj
Copy link
Author

injinj commented Mar 17, 2019

This is expected, roughly as I predicted in the comments (1, 2).

Ah yes, nice analysis.

@cmuratori
Copy link
Owner

We have addressed this in v0.5, which we will post as soon as we've finalized a few last things. Many thanks for the Long Neighbors test - I run your smhasher branch now as well as the other ones to make sure we pass them all :)

I will close this issue once v0.5 is posted and you can verify independently that we have no issues on your smhasher branch.

- Casey

@injinj
Copy link
Author

injinj commented Mar 31, 2019

Please note, the author of Long Neighbors to @hmakholm. I'm just a chimp with a keyboard. It is a cool way of attacking hash functions.

@svpv
Copy link

svpv commented Apr 27, 2019

@leo-yuriev t1ha2_atonce128 also fails the LongNeighbors test, and has been disqualified by @rurban. Is there a way to fix it?

@rurban
Copy link

rurban commented Apr 28, 2019

I would not call it "disqualified". I mostly use poor hash functions, when I know that the quality does not matter much. It just fails a test. This test came after the hash, so you can expect old hash functions to fail it. Newer hash functions don't.

@svpv
Copy link

svpv commented Apr 28, 2019

@leo-yuriev @rurban I noticed that the test doesn't fail with a non-zero seed (I simply added seed++ in t1ha2_atonce128_test; I then tried a few other non-zero seeds). So what does that mean? Are there still simple patterns in the input which result in collision, it's just harder to find them with a non-zero seed? Or does the seed make the patterns go away completely?

@hmakholm
Copy link

@svpv -- among the kinds of inputs the LongNeighbors test tries to find collisions between are long sequences of zero bytes with just a few bits set near the end. Some hashes have particular problems with inputs of that shape. If a hash has this problem and it treats its seed more or less in the same way as the input, it is quite plausible that the problem is only detected for nonzero seeds.

Once the test has shown that it is possible for the hash to get into a state where a few bits of difference in the remaining input can yield a 128-bit collision, it's prudent to assume such a state can also be reached with a nonzero seed -- but the input might need to be just right to achieve that, so the test probably won't chance upon it. The mere existence of such a state should lead to skepticism about the hash for purposes where extreme rarity of random collisions matters.

@cmuratori
Copy link
Owner

(While we are on the subject of LongNeighbors, I am finalizing Meow v0.5 right now and it is a bit haphazard because there are so many SMHashers. I currently test on 4 different SMHashers just to make sure there's nothing we don't pass cleanly, but it is somewhat cumbersome especially because not all of them report failures in a way that you can read at the end, so you have to scroll through the entire output sometimes to make sure nothing failed. Is there maybe a plan to someday unify the SMHashers back into one single one that we can count on to have all the latest sauce, and which clearly prints any suspicious things at the end? I ask here only because two SMHasher fork people are already on this thread - @injinj and @rurban.

In general, it isn't the end of the world if it stays the way it is, because we really only need it at the very end when we're pretty sure the hash function is solid, so it's rare that anything fails. But even so, it would be nice to have a more fire-and-forget way of validating through all SMHasher variants).

- Casey

@rurban
Copy link

rurban commented May 8, 2019

LongNeighbors still has a licensing problem, thus it cannot yet be published. so I can have it only in a branch. see rurban/smhasher#58

@injinj's gnuplot additions and yves https://github.com/demerphq/smhasher/ similar svg and TAP additions are also cool, besides the support for extra seed size in the structs. I have to add extra init code for such hashes so far.

@injinj
Copy link
Author

injinj commented May 8, 2019

I am using my smhasher fork for examining the hash algos that interest me. It's too much work to what @rurban smhasher is doing without having folks do pull requests. And then when you have that many hashes, you are essentially maintaining abandoned hashing projects except for the few that evolve with new CPU features (like SHA instructions), etc.

@cmuratori
Copy link
Owner

As of the v0.5 branch which is now uploaded, we pass LongNeighbors so I'm closing this issue. LongNeighbors at 128-bit and 64-bit are part of the standard testing I do before we update, too, so hopefully there will be no regressions that slip through on that, either.

- Casey

@injinj
Copy link
Author

injinj commented May 19, 2019

I forked the 0.4 branch and updated it to use 2x AESDEC rounds here: https://github.com/injinj/meow_hash This also passes LongNeighbors and is faster for small keys, but slower than v0.5 in the bulk speed test. I am going to use this version for other things.

@cmuratori cmuratori reopened this May 19, 2019
@cmuratori
Copy link
Owner

v0.5 is trying to do a lot more than pass LongNeighbors. It tries to prevent seed discovery, and allows full 128-bit seeding. So it probably won't ever get to the point where it is as fast as v0.4 with double AESDEC for very small keys, but it's not necessarily a good idea to use v0.4 with double AESDEC because I don't know that anyone has checked to see if that is secure. I suspect it probably isn't, but I haven't looked.

- Casey

@injinj
Copy link
Author

injinj commented May 19, 2019

I added your comments to the fork README. I'll still probably use it, the speed is fantastic with hyperthreading disabled, nearly as fast as the crc_c instruction.

@NoHatCoder
Copy link

Quick take: If you change it to xor the "Mixer" into the 4 initial states you probably have a decent level 2 hash, I currently have a probability 1 in 2^80 collision, but of course worse may be possible.

@injinj
Copy link
Author

injinj commented May 20, 2019

Quick take: If you change it to xor the "Mixer" into the 4 initial states you probably have a decent level 2 hash, I currently have a probability 1 in 2^80 collision, but of course worse may be possible.

Done: https://github.com/injinj/meow_hash/blob/17b49a0bf05a440943fc29c41f5bee5ad14363a6/meow_hash.h#L173-L183

It requires length to be known at MeowBegin(), but doesn't change performance or smhasher pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants