Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use cmeq, cmge, cmgt (zero) when one of the operands is Vector64/128<T>.Zero #33972

Closed
echesakov opened this issue Mar 23, 2020 · 3 comments
Closed
Assignees
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization Priority:3 Work that is nice to have
Milestone

Comments

@echesakov
Copy link
Contributor

echesakov commented Mar 23, 2020

For example,

The code as in #33749 (comment) BitArray:.ctor

dup     v17.16b, wzr
cmeq    v16.16b, v16.16b, v17.16b

should be optimized down to

cmeq    v16.16b, v16.16b, #0

This applies to all the intrinsics that are mapped to cmeq, cmge, cmgt, cmle, cmlt, fcmeq, fcmge, fcmgt, fcmle and fcmlt instructions

category:cq
theme:hardware-intrinsics
skill-level:intermediate
cost:small

@echesakov echesakov added arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization labels Mar 23, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Mar 23, 2020
@Gnbrkm41
Copy link
Contributor

// JIT does not support code hoisting for SIMD yet
// However comparison against zero can be replaced to cmeq against zero (vceqzq_s8)
// See dotnet/runtime#33972 for details
Vector128<byte> zero = Vector128<byte>.Zero;
fixed (bool* ptr = values)
{
for (; (i + Vector128ByteCount * 2u) <= (uint)values.Length; i += Vector128ByteCount * 2u)
{
// Same logic as SSE2 path, however we lack MoveMask (equivalent) instruction
// As a workaround, mask out the relevant bit after comparison
// and combine by ORing all of them together (In this case, adding all of them does the same thing)
Vector128<byte> lowerVector = AdvSimd.LoadVector128((byte*)ptr + i);
Vector128<byte> lowerIsFalse = AdvSimd.CompareEqual(lowerVector, zero);

@BruceForstall BruceForstall added this to the Future milestone Apr 4, 2020
@BruceForstall BruceForstall removed the untriaged New issue has not been triaged by the area owner label Apr 4, 2020
@CarolEidt CarolEidt modified the milestones: Future, 6.0.0 Oct 13, 2020
@JulieLeeMSFT JulieLeeMSFT added the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Mar 23, 2021
@JulieLeeMSFT JulieLeeMSFT removed the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Jun 7, 2021
@echesakov echesakov modified the milestones: 6.0.0, Future Jul 7, 2021
@echesakov echesakov added the Priority:3 Work that is nice to have label Jul 7, 2021
@echesakov echesakov modified the milestones: Future, 7.0.0 Oct 15, 2021
@echesakov
Copy link
Contributor Author

I believe this item was addressed fully.
@TIHan can you please confirm and close the issue?

@echesakov echesakov removed their assignment Mar 15, 2022
@TIHan
Copy link
Contributor

TIHan commented Mar 15, 2022

Yes, I believe it was. There is more opportunity with other instructions like 'cmle' and 'cmlt', but based on the title of this issue, we got them covered.

@TIHan TIHan closed this as completed Mar 15, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Apr 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization Priority:3 Work that is nice to have
Projects
None yet
Development

No branches or pull requests

7 participants