Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src/libkeccak: Use XKCP's non-CPU-specific 64-bit optimized implementation #8

Merged
merged 6 commits into from
Dec 20, 2018

Conversation

cakoose
Copy link
Contributor

@cakoose cakoose commented Dec 18, 2018

It looks like we were using the reference C implementation, which is not
designed for good performance.

Switching to the non-CPU-specific 64-bit optimized implementation makes
us ~2x faster on small inputs and ~6x faster on large inputs.

@cakoose
Copy link
Contributor Author

cakoose commented Dec 18, 2018

Running the benchmark on macOS 10.14.2, Node 10.14.2. (Not a typo -- exact same version number!)

Old code:

Bindings (current) x 166,786 ops/sec ±0.87% (85 runs sampled)
Pure JS (current) x 200,693 ops/sec ±0.96% (94 runs sampled)
Pure JS (sha3) x 11,886 ops/sec ±1.77% (94 runs sampled)
Pure JS (js-sha3) x 268,402 ops/sec ±0.97% (92 runs sampled)
Buffer 0bytes: fastest is Pure JS (js-sha3)
Bindings (current) x 4.05 ops/sec ±0.45% (15 runs sampled)
Pure JS (current) x 4.58 ops/sec ±1.40% (16 runs sampled)
Pure JS (sha3) x 0.16 ops/sec ±3.66% (5 runs sampled)
Pure JS (js-sha3) x 4.75 ops/sec ±2.82% (16 runs sampled)
Buffer 10MiB: fastest is Pure JS (js-sha3)

New code

Bindings (current) x 324,340 ops/sec ±1.19% (87 runs sampled)
...
Bindings (current) x 27.87 ops/sec ±1.31% (50 runs sampled)
...

(This uses the updated benchmark code from #7.)

@cakoose cakoose force-pushed the libkeccak-opt64 branch 3 times, most recently from 0fbea92 to 9442cb0 Compare December 18, 2018 02:55
@fanatid
Copy link
Member

fanatid commented Dec 19, 2018

@cakoose users with 32-bit CPU will receive segfault?

@cakoose
Copy link
Contributor Author

cakoose commented Dec 19, 2018

I think it will run correctly, but it will not run as quickly as the 32-bit optimized code.

Unfortunately, I'm on macOS and can't test because I can't find a 32-bit build of Node.

If you're on Linux, I think you can install a 32-bit build of Node (link) and then do:

$ node-gyp clean configure build --verbose --arch=ia32  # force 32-bit build
...
$ file build/Release/keccak.node  # confirm that binary is 32-bit
...
$ npm test
...

Would also be interesting to run the benchmark on 32-bit node and see if the current reference code runs faster than the 64-bit optimized code.

Ideally we would provide both the 32-bit and 64-bit optimized code and have node-gyp select the code depending on the platform. It seems like that should be possible, but unfortunately I know very little about node-gyp.

@fanatid
Copy link
Member

fanatid commented Dec 19, 2018

After cakoose#1 we will need figure out how support 32-bit version and can release major version.

@cakoose
Copy link
Contributor Author

cakoose commented Dec 19, 2018

I figured out how to configure "binding.gyp" to do what we want.

  1. I imported both the 32-bit optimized C code and the 64-bit optimized C code from XKCP.
  2. If the Node process.arch is "arm64", "ppc64", or "x64", we will build the 64-bit optimized implementation.
  3. Otherwise, we will build the 32-bit optimized implementation.

NOTE: I tested the 32-bit optimized C code on my 64-bit machine and it still runs faster than the "reference" C code we currently use (1.7x faster on short inputs, 2.5x faster on long inputs).

@fanatid
Copy link
Member

fanatid commented Dec 19, 2018

Awesome! Thank you @cakoose
Please check cakoose#1

@cakoose
Copy link
Contributor Author

cakoose commented Dec 19, 2018

I merged your commits into my branch and the tests passed. (I also incorporated your "src/README.md" fix into my own commit.)

Is there anything else you would like me to do?

cakoose and others added 6 commits December 19, 2018 13:57
It looks like we were using the reference C implementation, which is not
designed for good performance.

Switch to using either the 32-bit or 64-bit optimized C implementations,
depending on the Node "arch" variable.
@fanatid fanatid merged commit a7e7b02 into cryptocoinjs:master Dec 20, 2018
@fanatid
Copy link
Member

fanatid commented Dec 20, 2018

Thank you for big update and performance improvement! Published as 2.0.0

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants