Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow dcurl to compile and execute AVX code implementation on the hardware not supporting AVX2 #83

Closed
marktwtn opened this issue Nov 13, 2018 · 3 comments
Assignees

Comments

@marktwtn
Copy link
Collaborator

marktwtn commented Nov 13, 2018

The BUILD_AVX=1 command option would enable AVX code implementation of dcurl.
However, the current AVX code has two different implementations.

  • Support AVX instruction.
    Not the default. Need to modify Makefile and the source code to use it.
  • Support AVX and AVX2 instruction.
    The default.

The issue is focused on using the correct AVX code implementation automatically based on the hardware supported instruction.

@marktwtn marktwtn self-assigned this Nov 13, 2018
@jserv
Copy link
Member

jserv commented Nov 17, 2018

AVX (not AVX2) support is crucial to AMD Ryzen since its AVX2 is known to be slower than Intel Core i9 series.

@marktwtn
Copy link
Collaborator Author

The -Ofast optimization level makes AVX version unable to finish the execution.
However, it does not happen on AVX2 version.
If we change the optimization level to -O3, the problem would disappear.

The GCC version has nothing to do with the problem.

The difference of the assembly code:

-O3

        vucomisd        %xmm3, %xmm0
        jp      .L114
        jne     .L114

-Ofast

        vcomisd %xmm3, %xmm0
        jne     .L114

These code happens when the __m256d type variable is compared to another constant value or uses its value in the logical operation.

Example:

__m256d nonce_probe = ...
...
nonce_probe[0] == LBITS
__m256d carry;
...
i == INCR_START || carry[0]

The jp instruction would jump if one of the previous comparison operand is NaN.
The NaN value are defined in the IEEE floating-point standard.

The bitwise operation in the PoW might created the NaN value.
However, the NaN value is not handled in the -Ofast optimization level since it optimizes out the jp instruction.

@jserv
Copy link
Member

jserv commented Nov 19, 2018

Let's stick to -O3 optimization order and explain for further tracking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants