Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized curl transform formulas #97

Closed
jserv opened this issue Feb 11, 2019 · 5 comments
Closed

Optimized curl transform formulas #97

jserv opened this issue Feb 11, 2019 · 5 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@jserv
Copy link
Member

jserv commented Feb 11, 2019

powsrv.io team optimized the curl transform formulas, claiming 7% speedup. dcurl should benefit from the changes proposed in entangled PR #803.

Reference: Curl improvement in IRI.

@wusyong wusyong added this to the sprint-201902 milestone Feb 11, 2019
@jserv jserv added the enhancement New feature or request label Feb 11, 2019
@wusyong
Copy link

wusyong commented Feb 12, 2019

This uses Quine-McCluskey algo. to optimize transform function. While BUILD_SSE has significant improvement by ~10%, BUILD_AVX only gets ~1%. The benchmark is taken on node0 with commit 4afc057.

SSE

make check BUILD_STAT=1:

  • Original:
[dcurl] Implementation CPU (Intel SSE) is initialized successfully
PoW execution times: 100 times.
Hash rate average value: 7997.251 kH/sec,
with the range +- 109.604 kH/sec including 95% of the hash rate values.
  • Optimized:
[dcurl] Implementation CPU (Intel SSE) is initialized successfully
PoW execution times: 100 times.
Hash rate average value: 9075.665 kH/sec,
with the range +- 112.197 kH/sec including 95% of the hash rate values.

AVX

make check BUILD_AVX=1 BUILD_STAT=1:

  • Original:
    Hash rate are around 94XX kH/s with multiple tests
[dcurl] Implementation CPU (Intel AVX) is initialized successfully
PoW execution times: 100 times.
Hash rate average value: 9452.758 kH/sec,
with the range +- 394.444 kH/sec including 95% of the hash rate values.
  • Optimized:
    Hash rate are around 95XX kH/s with multiple tests
[dcurl] Implementation CPU (Intel AVX) is initialized successfully
PoW execution times: 100 times.
Hash rate average value: 9578.751 kH/sec,
with the range +- 341.911 kH/sec including 95% of the hash rate values.

@jserv
Copy link
Member Author

jserv commented Feb 12, 2019

@wusyong, Can you manually disable AVX2-specific execution paths and compare again?

@wusyong
Copy link

wusyong commented Feb 12, 2019

@jserv Results remain same. And after testing AVX2 on devorg the results have no change and sometimes worse. I even just removed operator for ngamma and alpha then it will be worse.

AVX2

Original:
Hash rate is around 25xxx kH/sec

[dcurl] Implementation CPU (Intel AVX) is initialized successfully
PoW execution times: 100 times.
Hash rate average value: 25524.854 kH/sec,
with the range +- 6373.506 kH/sec including 95% of the hash rate values.

Optimized:
Hash rate is around 25xxx~24XXX kH/sec

[dcurl] Implementation CPU (Intel AVX) is initialized successfully
PoW execution times: 100 times.
Hash rate average value: 24271.330 kH/sec,
with the range +- 7454.857 kH/sec including 95% of the hash rate values.

@jserv
Copy link
Member Author

jserv commented Feb 12, 2019

Extra parameters -march=native -mtune=native might be set to CFLAGS if we would like to ask GCC to operate normally without causing #GP(0) on any byte-granularity alignment (unlike SSE instructions).

IMHO, applying Quine-McCluskey algorithm for micro-optimization is reasonable, and we can rework AVX backend later. Please send pull request(s) along with appropriate explanation.

Reference: https://software.intel.com/sites/default/files/managed/b4/3a/319433-024.pdf

wusyong pushed a commit to wusyong/dcurl that referenced this issue Feb 13, 2019
Utilize Quine-McCluskey algorithm to optimized transform function in
each pow implementation files. SSE has improvement by ~10%, while AVX
and AVX2 have no significant change. Following benchmarks are tested on
node0.

SSE
Original:
Hash rate average value: 7997.251 kH/sec,
with the range +- 109.604 kH/sec including 95% of the hash rate values.

Optimized:
Hash rate average value: 9075.665 kH/sec,
with the range +- 112.197 kH/sec including 95% of the hash rate values.

AVX
Original:
Hash rate are around 94XX kH/s with multiple tests
Hash rate average value: 9452.758 kH/sec,
with the range +- 394.444 kH/sec including 95% of the hash rate values.

Optimized:
Hash rate are around 95XX kH/s with multiple tests
Hash rate average value: 9578.751 kH/sec,
with the range +- 341.911 kH/sec including 95% of the hash rate values.

Resolve DLTcollab#97
@muXxer
Copy link

muXxer commented Feb 14, 2019

You are welcome ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants