Skip to content
This repository has been archived by the owner on Sep 22, 2022. It is now read-only.

Notes about t1ha_v3 design #36

Open
erthink opened this issue Feb 13, 2020 · 6 comments
Open

Notes about t1ha_v3 design #36

erthink opened this issue Feb 13, 2020 · 6 comments
Assignees

Comments

@erthink
Copy link
Owner

erthink commented Feb 13, 2020

  1. Dirty and Fair kinds:
    • t1ha3_dirty = favour speed over quality: non-uniform, all hardware tricks (ie. AES-NI), one injection point and blinding multiplication are allowed;
    • t1ha3_fair = favour quality over speed: two injection points, no known drawbacks.
  2. Goals:
    • t1ha3_dirty: should be faster than competitors (e.g. xxHash_v3, MeowHash and wyhash_v777), while not inferior in quality; Or at least the same speed with better quality (i.e. no several drawbacks).
    • t1ha3_fair: proven excellence in the quality of competitors, with minimal performance degradation.
    • target architectures in order of priority: E2K, AMD64, AARCH64, RISC-V, MIPS64, MIPS32, ARM (no Thumb, but Thumb-2).
    • optional: additional variations of speed/quality/security tradeoffs.
  3. Endianness: No explicit LE/BE versions:
    • t1ha3_dirty: the native byte order is always used;
    • t1ha3_fair: Little Endian byte order is primary, bswap on BE platforms.
  4. Bitness:
    • result: 32, 64, 128.
    • implementations: 64, 32.
  5. Insights:
    • avoid MUM-mixer;
    • single header-only C99 implementation;
    • avoid unaligned read;
    • avoid unsafe user options;
    • SIMD-frendly;
    • non-injective mixers.

Schedule:

  • Most likely, only the final version will be published when there is no doubt about achieving the goals.
  • No specific deadlines are publicly declared. Sponsor the development if you need any guarantees.
@wangyi-fudan

This comment has been minimized.

@erthink
Copy link
Owner Author

erthink commented Feb 16, 2020

More details:

  1. I finally convinced myself to give up the Vladimir Markov's mum-mixer:
    • With good mixing, it is not uniform, loses entropy, and makes difficult to analyze the quality of hashing
    • None platforms can effectively vectorize it;
    • On all target platforms except AMD64, an additional "high-part" multiplication instruction is required.
    • Moreover, I got information (but don't ask me, please) that allows me to conclude that in all future generations of processors, the wide multiplication will be performed slightly slower (i.e. x86), and on architectures with a separate instruction (ARM, MIPS), multiplication will always start from scratch (without trying to use the result of the previous narrow multiplication). This is the complete opposite of knowing a few years ago. So, no mul_64x64_128() at all.
  2. UMAC approach but with two, three or four data-injections (or a Toeplitz construction) seems good enough for t1ha_v3:
    • Of course, will have to pay for this by speed, but it will allow to provide a proven level of quality, and support for a secret salt with an attack complexity of more than 2^64..128 (more analysis is required).
    • Most likely, the state will be based on the PRNG-kernel and have a bitness of 2-3-4 times greater than the result. This works great for E2K right now, but Haswell isn't doing so well.
    • Nonetheless, for the "dirty" kind of function I plan use Bulat Ziganshin's FARSH approach. I think this will allows get portable vectorizable code that runs faster than Wang Yi's wyhash, with slightly better quality (i.e. without mum-mixer flaws).
  3. The main difference from Yann Collet's XXH3 and Bulat Ziganshin's FARSH, seems, will be using only 64-bit operations, even multiplication:
    • This applies only to the "fair" 64-bit kinds of the function, but not to the "dirty" kind or 32-bit versions.
    • For this reason, the function will only run at full speed if all vectorized operations for 64-bit operands are available, including multiplication. This is excellent for E2K and x86 with AVX512, but consi-consa for AVX2/Neon, and bad for SSE2.
    • However, this solution seems reasonable for me, since it allows you to fully use the features of E2K and AVX512, and on AVX2 and Neon just works not bad. In any case, I don't think I should do an analog of XXH3, but I should move up.
  4. The 512-bit block size. There is no reason to do otherwise.
  5. Carefully selected mixers for short data. There will also be a set of fundamental differences from XXH3, but it's too early to put cards on the table.

@easyaspi314

This comment has been minimized.

@erthink
Copy link
Owner Author

erthink commented Mar 8, 2020

@easyaspi314, thank for feedback!

I'm doing a very different job now, but I think reasonable to clarify some aspect:

  • XXH3 by Yann (@Cyan4973) is perfect enough for now, i.e. we don't need a clone of XXH3.
  • E2K will be my main target platform for foreseeable future for a number of reasons. The most important thing is that this isa really serious technology claim, since a prototypes are significantly ahead of Intel/AMD (adjusted to the silicon lithography, which does not stand a chance without "proton light"). For clarity E2K could "run" x86 code like a Valgrind, just for compatibility for existing software.
  • AVX512 is not an objective, but a minimal basis corresponding to next E2K.

@erthink
Copy link
Owner Author

erthink commented Apr 21, 2020

Wow, (magically) my t1ha_v3 draft have a lot in common with SHISHUA by @espadrine.
If things go on like this, there's nothing left for me.

@gzm55
Copy link

gzm55 commented Apr 22, 2020

newton and leibniz are both greatest mathematicians who independently invented calculus. look towards for your great work~

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants