Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment - Remove space restriction from XarchIntrinsicConstants #102913

Closed

Conversation

khushal1996
Copy link
Contributor

NO NEED FOR REVIEW

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label May 31, 2024
Copy link
Contributor

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

@@ -50,11 +50,11 @@ extern RhConfig * g_pRhConfig;

#if defined(HOST_X86) || defined(HOST_AMD64) || defined(HOST_ARM64)
// This field is inspected from the generated code to determine what intrinsics are available.
EXTERN_C int g_cpuFeatures;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are bit-tested at runtime using compiler generated code. This is going to be quite complicated to do without regressing perf on 32bit x86.

I already wrote this elsewhere - I would like to get an understanding why we're wasting bits in this enum - a lot of the bits are redundant, e.g. XArchIntrinsicConstants_Avx2 and XArchIntrinsicConstants_VectorT256. Why do we need this duplication? Is this putting RyuJIT implementation details ("VectorT256" that is not an ISA extension as far as I know) into PAL?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a lot of the bits are redundant, e.g. XArchIntrinsicConstants_Avx2 and XArchIntrinsicConstants_VectorT256

That's 3 bits (VectorT128, VectorT256, VectorT512). I would not call it a lot. Is there anything else?

Why do we need this duplication? Is this putting RyuJIT implementation details ("VectorT256" that is not an ISA extension as far as I know) into PAL?

Right, we do not need the VectorT* bits in the minipal. They got there by sequence of copy&paste operations of code that was in CoreCLR JIT/EE interface originally.

We do need the VectorT* bits on JIT/EE interface (that is different enum). The vector size is modelled as virtual ISA there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's 3 bits (VectorT128, VectorT256, VectorT512). I would not call it a lot. Is there anything else?

All of the VL suffixes that seem to just be shortcuts for XArchIntrinsicConstants_Avx512f_vl plus something else that already has a bit.

The question is basically whether we can get away with 32 bits until we're ready to call Sse41 and friends "baseline". The underlying oses are starting to do that on both Windows and Linux side. Or we'll need to do the more complicated thing and extend this to arbitrary lengths like this PR is doing.

Copy link
Member

@tannergooding tannergooding May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a lot of the bits are redundant, e.g. XArchIntrinsicConstants_Avx2 and XArchIntrinsicConstants_VectorT256.

These are not duplicate. The former describes support for Avx2 the latter describes the size of Vector<T>, these do not have to line up and it may be important to understand the encoded size of Vector<T> for ABI purposes.

There are some opportunities to fold together bits. For example, AVX512*_VL could just be Avx512vl. Given that we require AVX512F+BW+CD+DQ+VL to be provided simultaneously, we could combine all of these into one as well.

We could notably also not allow as fine-grained control as this currently is allowing. Rather than allowing individual opt-in to SSE3, SSSE3, SSE4.1, etc, we could force these to be present at the respective target baselines (i.e. x86-64-v2, x86-64-v3, x86-64-v4) and only include the standalone ISAs that are independent of the primary baselines

However, all of that is less control than most other compilers toolchains give and we're likely to run out of bits sooner than we're able to meaningfully compress them and drop support for downlevel ISAs (although I am definitively still supportive of us moving towards an x86-64-v2 baseline). So it's probably goodness that we take a look at this since we're already pushing up against the boundaries already.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not duplicate.

They are duplicate/undesirable at PAL level. PAL should be policy free. XArchIntrinsicConstants_... is PAL level enum.

They are not duplicate at the JIT/EE interface level where the Vector policy size policy lives.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the VL suffixes that seem to just be shortcuts for XArchIntrinsicConstants_Avx512f_vl plus something else that already has a bit.

I agree that we can use just a single bit for VL in this enum if we needed to free up more bits.

@khushal1996
Copy link
Contributor Author

@tannergooding @jkotas @MichalStrehovsky
This experiment PR was an effort towards making space for the new ISA added in PR #101938 which was urgent for us at Intel. For now, we did make space by removing unused isa Avx10v1_V256. But I was trying to make more space for ISAs. Looks like VectorT* bits in minipal were removed and we are discussing getting rid of the VL bits as well. I will keep this open for discussion and close it once completed.

@khushal1996
Copy link
Contributor Author

Closing this discussion.

@github-actions github-actions bot locked and limited conversation to collaborators Jul 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-NativeAOT-coreclr community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants