Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add some avx512f intrinsics(mask, rotation, shift) #884

Merged
merged 9 commits into from
Aug 25, 2020
Merged

add some avx512f intrinsics(mask, rotation, shift) #884

merged 9 commits into from
Aug 25, 2020

Conversation

minybot
Copy link
Contributor

@minybot minybot commented Aug 18, 2020

add some avx512f intrinsics
[_mm512_and_epi32]
[_mm512_and_epi64]
[_mm512_and_si512]
[_mm512_kand]
[_mm512_kor]
[_mm512_kxor]
[_kand_mask16]
[_kor_mask16]
[_kxor_mask16]
[_mm512_mask_and_epi32]
[_mm512_mask_and_epi64]
[_mm512_mask_or_epi32]
[_mm512_mask_or_epi64]
[_mm512_mask_rol_epi32]
[_mm512_mask_rol_epi64]
[_mm512_mask_rolv_epi32]
[_mm512_mask_rolv_epi64]
[_mm512_mask_ror_epi32]
[_mm512_mask_ror_epi64]
[_mm512_mask_rorv_epi32]
[_mm512_mask_rorv_epi64]
[_mm512_mask_sll_epi32]
[_mm512_mask_sll_epi64]
[_mm512_mask_slli_epi32]
[_mm512_mask_slli_epi64]
[_mm512_mask_sllv_epi32]
[_mm512_mask_sllv_epi64]
[_mm512_mask_sra_epi32]
[_mm512_mask_sra_epi64]
[_mm512_mask_srai_epi32]
[_mm512_mask_srai_epi64]
[_mm512_mask_srav_epi32]
[_mm512_mask_srav_epi64]
[_mm512_mask_srl_epi32]
[_mm512_mask_srl_epi64]
[_mm512_mask_srli_epi32]
[_mm512_mask_srli_epi64]
[_mm512_mask_srlv_epi32]
[_mm512_mask_srlv_epi64]
[_mm512_mask_xor_epi32]
[_mm512_mask_xor_epi64]
[_mm512_maskz_and_epi32]
[_mm512_maskz_and_epi64]
[_mm512_maskz_or_epi32]
[_mm512_maskz_or_epi64]
[_mm512_maskz_rol_epi32]
[_mm512_maskz_rol_epi64]
[_mm512_maskz_rolv_epi32]
[_mm512_maskz_rolv_epi64]
[_mm512_maskz_ror_epi32]
[_mm512_maskz_ror_epi64]
[_mm512_maskz_rorv_epi32]
[_mm512_maskz_rorv_epi64]
[_mm512_maskz_sll_epi32]
[_mm512_maskz_sll_epi64]
[_mm512_maskz_slli_epi32]
[_mm512_maskz_slli_epi64]
[_mm512_maskz_sllv_epi32]
[_mm512_maskz_sllv_epi64]
[_mm512_maskz_sra_epi32]
[_mm512_maskz_sra_epi64]
[_mm512_maskz_srai_epi32]
[_mm512_maskz_srai_epi64]
[_mm512_maskz_srav_epi32]
[_mm512_maskz_srav_epi64]
[_mm512_maskz_srl_epi32]
[_mm512_maskz_srl_epi64]
[_mm512_maskz_srli_epi32]
[_mm512_maskz_srli_epi64]
[_mm512_maskz_srlv_epi32]
[_mm512_maskz_srlv_epi64]
[_mm512_maskz_xor_epi32]
[_mm512_maskz_xor_epi64]
[_mm512_or_epi32]
[_mm512_or_epi64]
[_mm512_or_si512]
[_mm512_rol_epi32]
[_mm512_rol_epi64]
[_mm512_rolv_epi32]
[_mm512_rolv_epi64]
[_mm512_ror_epi32]
[_mm512_ror_epi64]
[_mm512_rorv_epi32]
[_mm512_rorv_epi64]
[_mm512_sll_epi32]
[_mm512_sll_epi64]
[_mm512_slli_epi32]
[_mm512_slli_epi64]
[_mm512_sllv_epi32]
[_mm512_sllv_epi64]
[_mm512_sra_epi32]
[_mm512_sra_epi64]
[_mm512_srai_epi32]
[_mm512_srai_epi64]
[_mm512_srav_epi32]
[_mm512_srav_epi64]
[_mm512_srl_epi32]
[_mm512_srl_epi64]
[_mm512_srli_epi32]
[_mm512_srli_epi64]
[_mm512_srlv_epi32]
[_mm512_srlv_epi64]
[_mm512_xor_epi32]
[_mm512_xor_epi64]
[_mm512_xor_si512]

@rust-highfive
Copy link

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @gnzlbg (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@Amanieu
Copy link
Member

Amanieu commented Aug 19, 2020

Can you add #[rustc_args_required_const(X)] for all the imm8 arguments to the shifts and rotates?

@Amanieu
Copy link
Member

Amanieu commented Aug 20, 2020

You should run cargo fmt to fix the CI errors.

@Amanieu
Copy link
Member

Amanieu commented Aug 20, 2020

thread 'rustc' panicked at 'index out of bounds: the len is 2 but the index is 2', src/librustc_mir/transform/promote_consts.rs:412:81

Double-check your rustc_args_required_const. They are indexed from 0 (the first argument is argument 0).

2020-08-20T13:15:34.0092099Z failed to verify `_mm512_rol_epi32`
2020-08-20T13:15:34.0092659Z   * failed to equate: `const int` and PrimUnsigned(8) for _mm512_rol_epi32
2020-08-20T13:15:34.0093152Z failed to verify `_mm512_mask_rol_epi32`
2020-08-20T13:15:34.0093605Z   * failed to equate: `const int` and PrimUnsigned(8) for _mm512_mask_rol_epi32
2020-08-20T13:15:34.0094027Z failed to verify `_mm512_maskz_rol_epi32`
2020-08-20T13:15:34.0094482Z   * failed to equate: `const int` and PrimUnsigned(8) for _mm512_maskz_rol_epi32
2020-08-20T13:15:34.0094938Z failed to verify `_mm512_ror_epi32`
2020-08-20T13:15:34.0103706Z   * failed to equate: `int` and PrimUnsigned(8) for _mm512_ror_epi32
2020-08-20T13:15:34.0104711Z error: test failed, to rerun pass '--test x86-intel'
2020-08-20T13:15:34.0106140Z failed to verify `_mm512_mask_ror_epi32`
2020-08-20T13:15:34.0106493Z   * failed to equate: `int` and PrimUnsigned(8) for _mm512_mask_ror_epi32
2020-08-20T13:15:34.0106892Z failed to verify `_mm512_maskz_ror_epi32`
2020-08-20T13:15:34.0107293Z   * failed to equate: `int` and PrimUnsigned(8) for _mm512_maskz_ror_epi32
2020-08-20T13:15:34.0107675Z failed to verify `_mm512_rol_epi64`
2020-08-20T13:15:34.0108049Z   * failed to equate: `const int` and PrimUnsigned(8) for _mm512_rol_epi64
2020-08-20T13:15:34.0108806Z failed to verify `_mm512_mask_rol_epi64`
2020-08-20T13:15:34.0109256Z   * failed to equate: `const int` and PrimUnsigned(8) for _mm512_mask_rol_epi64
2020-08-20T13:15:34.0109642Z failed to verify `_mm512_maskz_rol_epi64`
2020-08-20T13:15:34.0110043Z   * failed to equate: `const int` and PrimUnsigned(8) for _mm512_maskz_rol_epi64
2020-08-20T13:15:34.0110557Z failed to verify `_mm512_ror_epi64`
2020-08-20T13:15:34.0110934Z   * failed to equate: `int` and PrimUnsigned(8) for _mm512_ror_epi64
2020-08-20T13:15:34.0111318Z failed to verify `_mm512_mask_ror_epi64`
2020-08-20T13:15:34.0111708Z   * failed to equate: `int` and PrimUnsigned(8) for _mm512_mask_ror_epi64
2020-08-20T13:15:34.0112121Z failed to verify `_mm512_maskz_ror_epi64`
2020-08-20T13:15:34.0112495Z   * failed to equate: `int` and PrimUnsigned(8) for _mm512_maskz_ror_epi64
2020-08-20T13:15:34.0113211Z thread 'verify_all_signatures' panicked at 'assertion failed: all_valid', crates/stdarch-verify/tests/x86-intel.rs:356:5

Use i32 for the immediate operand since that is what the C api uses. In the function use an assert! to check that the value is between 0 and 255.

@minybot
Copy link
Contributor Author

minybot commented Aug 20, 2020

2020-08-20T13:15:34.0092099Z failed to verify _mm512_rol_epi32
2020-08-20T13:15:34.0092659Z * failed to equate: const int and PrimUnsigned(8) for _mm512_rol_epi32

__m512i _mm512_ror_epi32 (__m512i a, int imm8)
I use u8 in the code, so I need to match the format from the spec ( use i32 instead of u8 )?
However, if I use mm512_ror_epi32(a, -1) in C, Clang will not compile it.

@Amanieu
Copy link
Member

Amanieu commented Aug 20, 2020

Use an i32 and then use assert! to check that the value is in range 0..256.

@Amanieu
Copy link
Member

Amanieu commented Aug 20, 2020

Unfortunately rustc currently doesn't support compile-time bound-checking for intrinsic arguments, so we need to use a runtime assert instead.

@minybot
Copy link
Contributor Author

minybot commented Aug 20, 2020

Unfortunately rustc currently doesn't support compile-time bound-checking for intrinsic arguments, so we need to use a runtime assert instead.

Will it be supported in the future?

@Amanieu
Copy link
Member

Amanieu commented Aug 20, 2020

There are plans to support it with where bounds on const generics. But from what I know this feature is still very much a work in progress.

In any case we currently do runtime bound checking for all the other intrinsics, so it's fine to use it here as well.

@Amanieu
Copy link
Member

Amanieu commented Aug 20, 2020

Now you need to fix the assert_instr to use a constant for the imm8 parameters. You can do this with #[assert_instr(blah, imm8 = 1)]. The actual value of imm8 doesn't matter as long as it is valid and not 0.

@minybot
Copy link
Contributor Author

minybot commented Aug 22, 2020

In my platform Ubuntu 20.04 with gcc version 9.3.0 (Ubuntu 9.3.0-10ubuntu2) and clang version 10.0.0-4ubuntu1,
Cargo test can pass all tests in TARGET=x86_64-unknown-linux-gnu.

@Amanieu
Copy link
Member

Amanieu commented Aug 22, 2020

The failing tests are for i686-unknown-linux-gnu, can you try on that target?

@minybot
Copy link
Contributor Author

minybot commented Aug 22, 2020

The failing tests are for i686-unknown-linux-gnu, can you try on that target?

test result: ok. 1109 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

 Running target/i686-unknown-linux-gnu/debug/deps/cpu_detection-0c9778724d31d16d

Work in my machine.

@Amanieu
Copy link
Member

Amanieu commented Aug 23, 2020

The proper command to run the tests is:

TARGET=x86_64-unknown-linux-gnu ci/run.sh

The errors happen both with x86_64-unknown-linux-gnu i686-unknown-linux-gnu.

I had a quick look, it should just be a matter of adjusting the instructions in assert_instr.

@minybot
Copy link
Contributor Author

minybot commented Aug 23, 2020

The proper command to run the tests is:

TARGET=x86_64-unknown-linux-gnu ci/run.sh

The errors happen both with x86_64-unknown-linux-gnu i686-unknown-linux-gnu.

I had a quick look, it should just be a matter of adjusting the instructions in assert_instr.

Thanks for your help.
An example is _mm512_ror_epi32.
CI generate: vprold $0x1f,%zmm1,%zmm0{%k1} (_mm512_rol_epi32)

If I use cargo +nightly rustc -- --emit asm in my test file in my machine, it generate
vprorvd %zmm0, %zmm1, %zmm0

fn main() {
#[target_feature(enable = "avx512f")]
unsafe {
let a = _mm512_set_epi32(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1);
let r = _mm512_ror_epi32(a, 1);
}
}

In the website, it should be
__m512i _mm512_ror_epi32 (__m512i a, int imm8)
#include <immintrin.h>
Instruction: vprord zmm, zmm, imm8

@Amanieu
Copy link
Member

Amanieu commented Aug 23, 2020

#[target_feature(enable = "avx512f")] needs to be on the function otherwise it has no effect. This is causing the intrinsic in your test case to not be inlined. (rust-lang/rust#54584, rust-lang/rust#73461)

Keep in mind that a rotate-right can be turned into a rotate-left by adjusting the immediate operand, which the compiler does when it is a constant. So it's fine for the compiler to turn a rol into a ror when the shift is a constant.

@minybot
Copy link
Contributor Author

minybot commented Aug 24, 2020

Keep in mind that a rotate-right can be turned into a rotate-left by adjusting the immediate operand, which the compiler does when it is a constant. So it's fine for the compiler to turn a rol into a ror when the shift is a constant.

When I compile it with "clang -S -masm=intel test.c -mavx512f" with the _mm512_ror_epi32(a, 1),
It can generate "vprord" correctly.

Does this two line guide how the compiler to generate the code?
#[link_name = "llvm.x86.avx512.mask.pror.d.512"]
fn vprord(a: i32x16, i8: i32) -> i32x16;

@Amanieu
Copy link
Member

Amanieu commented Aug 24, 2020

You need to put #[target_feature(enable = "avx512f")] on fn main, not inside it.

@minybot
Copy link
Contributor Author

minybot commented Aug 24, 2020

You need to put #[target_feature(enable = "avx512f")] on fn main, not inside it.

I modify my code to:

#[target_feature(enable = "avx512f")]
unsafe fn avx512() {
let a = _mm512_set_epi32(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1);
let r = _mm512_slli_epi32(a, 1);
}

fn main() {
unsafe { avx512(); }
}

After cargo rustc -- --emit asm, "vpslld %xmm0, %zmm1, %zmm2" is generated.

However, running TARGET=x86_64-unknown-linux-gnu ci/run.sh shows:

---- core_arch::x86::avx512f::assert__mm512_slli_epi32_vpslld stdout ----
disassembly for stdarch_test_shim__mm512_slli_epi32_vpslld:
0: lea 0xe5d1a(%rip),%rax # 17f721 <anon.9ccf9f4411a0875b0483d7be76d90a7c.130.llvm.9943284846656045949+0x6>
1: lea 0x18e602(%rip),%rcx # 228010 <_ZN12stdarch_test11_DONT_DEDUP17h57273fcf81c79015E>
2: mov %rax,(%rcx)
3: vpaddd %zmm0,%zmm0,%zmm0
4: retq
5: nopl 0x0(%rax,%rax,1)

It seems that it generated wrong code?

@Amanieu
Copy link
Member

Amanieu commented Aug 24, 2020

A left shift by 1 is the same as multiplying the number by 2. The compiler therefore optimizes it to an add instruction. Use a different immediate value in assert_instr to avoid this.

@minybot
Copy link
Contributor Author

minybot commented Aug 24, 2020

A left shift by 1 is the same as multiplying the number by 2. The compiler therefore optimizes it to an add instruction. Use a different immediate value in assert_instr to avoid this.

Yes, after I change from 1 to 5 #[cfg_attr(test, assert_instr(vpslld, imm8 = 5))], it passed.

However, __mm512_ror_epi32 still generates "vprold" whatever I change the imm8 value.

Also,
#[inline]
#[target_feature(enable = "avx512f")]
#[cfg_attr(test, assert_instr(vpandd))]
pub unsafe fn _mm512_and_epi32(a: __m512i, b: __m512i) -> __m512i {
transmute(simd_and(a.as_i32x16(), b.as_i32x16()))
}
Do you know how to solve this? It generate "VPANDQ" instead of "VPANDD"

"Intel C/C++ Compiler Intrinsic Equivalents
VPANDD __m512i _mm512_and_epi32( __m512i a, __m512i b);
VPANDQ __m512i _mm512_and_epi64( __m512i a, __m512i b);"

@Amanieu
Copy link
Member

Amanieu commented Aug 24, 2020

AND is a bitwise operation so VPANDQ and VPANDD are equivalent if no mask is used. This means that the compiler can choose either one.

Same with VPROL/VPROR: since on i32 a rotate right by 1 is equivalent to a rotate left by 31, the compiler can choose either instruction.

In both of these cases you can just adjust the assert_instr to use the instruction that is actually generated.

@minybot
Copy link
Contributor Author

minybot commented Aug 24, 2020

In both of these cases you can just adjust the assert_instr to use the instruction that is actually generated.
Thanks. They are all sorted out.

Finally, "mask and".
It uses normal "and" instead of "simd and". Any suggestion?

#[inline]
#[target_feature(enable = "avx512f")]
#[cfg_attr(test, assert_instr(kandw))]
pub unsafe fn _mm512_kand(a: __mmask16, b: __mmask16) -> __mmask16 {
transmute(kandw(a, b))
}

---- core_arch::x86::avx512f::assert__mm512_kand_kandw stdout ----
disassembly for stdarch_test_shim__mm512_kand_kandw:
0: mov %edi,%eax
1: lea 0xe7d5d(%rip),%rcx # 182266 <anon.88733f2f432ee8793fe096dc7fd32f17.326.llvm.14806043085678057426+0x23>
2: lea 0x18db00(%rip),%rdx # 228010 <_ZN12stdarch_test11_DONT_DEDUP17h57273fcf81c79015E>
3: mov %rcx,(%rdx)
4: and %esi,%eax
5: retq
6: nopw %cs:0x0(%rax,%rax,1)

@Amanieu
Copy link
Member

Amanieu commented Aug 24, 2020

The masks are just integers. Unless they are used with an AVX512 instruction, the compiler has no need to actually place them in a k* mask register. So it just uses normal integer instructions.

I think it's fine to just let those use normal and instructions, just add a comment next to the assert_instr to explain why it isn't using kand.

@Amanieu Amanieu merged commit 7ba34a3 into rust-lang:master Aug 25, 2020
@Amanieu
Copy link
Member

Amanieu commented Aug 25, 2020

Thanks!

@minybot minybot deleted the avx512 branch August 25, 2020 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants