Change Vector2/3/4, Quaternion, Plane, Vector<T>, and Vector64/128/256/512<T> to be implemented in managed where trivially possible #102301

tannergooding · 2024-05-16T05:23:24Z

This is a basic prototype for #102275 and covers the all vectorized types where the handling can be trivially done in managed (such as by deferring to another intrinsic API).

As part of that, it also removes MethodImpl(MethodImplOptions.AggressiveInlining) from various APIs where the method IL size is approx. less than the always inline threshold for RyuJIT (this was estimated by examining a few functions and finding that 2 or less calls with no complex logic typically fits the need).

dotnet-policy-service · 2024-05-16T05:23:53Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

…ented in managed where trivially possible

tannergooding · 2024-06-03T02:52:49Z

Reopened now that #102702, #102827, and #102973 has gone in. We should no longer see pessimizations caused from needing constants during import and so we should see fewer diffs overall and ideally be in a position to consider getting this merged and building on it more.

tannergooding · 2024-06-03T05:45:43Z

Things are already looking much better than before.

minopts size regressions are expected since it would now require inlining to get the optimized codegen. This is something that needs to be discussed around the overall impact, but I expect its fine since its T0 code.

fullopts has almost no changes and we are doing the right thing for the most part. There are some more complex methods where we run into limits for the default inlining heuristics given that we no longer get the [Intrinsic] profitability boost. We could solve this by still treating APIs in System.Runtime.Intrinsics as [Intrinsic], regardless of whether or not they're annotated. This gives us the profitability boost without needing to also have the implementation in managed.

There is an assert around V512.CreateScalar that needs to be looked at and some mono failures, which at a glance looks to be a legitimate bug in Mono.

…PIs as intrinsic for the profitability boost

jkotas · 2024-06-03T22:49:11Z

src/libraries/System.Private.CoreLib/src/System/Numerics/Vector4.Extensions.cs

+        /// <returns><paramref name="value" /> reinterpreted as a new <see cref="Plane" />.</returns>
+        internal static Plane AsPlane(this Vector4 value)
+        {
+#if MONO


Is this perf problem or functionality problem?

Either way, it is likely going to be fixed as side-effect of #102988 .

Possibly a functionality problem that I've not dug into yet. The previous tested commit had seen some failures related to Quaternion/Vector4/Plane and their equality tests, but only on Mono.

My current guess is that there's something subtly incorrect in the Mono handling that breaks for SIMD here and it will need a fix before BitCast can be used, but I'd like to try and ensure CoreCLR is clean without regressions before I do any more in depth Mono changes.

tannergooding · 2024-06-05T06:20:18Z

@MihuBot

tannergooding · 2024-06-05T19:23:46Z

One regression is in System.Drawing.RectangleF:ToVector4()

-       vmovups  xmm0, xmmword ptr [rdi]
+       vmovss   xmm0, dword ptr [rdi]
+       vinsertps xmm0, xmm0, dword ptr [rdi+0x04], 16
+       vinsertps xmm0, xmm0, dword ptr [rdi+0x08], 32
+       vinsertps xmm0, xmm0, dword ptr [rdi+0x0C], 48

This one is because while we have fgMorphCombineSIMDFieldStores, we don't have an fgMorphCombineSIMDFieldLoads instead we only had some logic as part of NI_Vector4_Create that looked for consecutive field accesses. This one shouldn't be a blocker and can be pretty easily handled in a follow up PR.

There's a small regression in System.IO.Hashing.Crc32:UpdateVectorized:

 G_M1352_IG02:
        mov      rax, rsi
        mov      ecx, edx
-       vmovups  xmm0, xmmword ptr [rax]
        cmp      ecx, 128
        jge      SHORT G_M1352_IG04
- 						;; size=17 bbWeight=1 PerfScore 5.75
+						;; size=13 bbWeight=1 PerfScore 1.75
 G_M1352_IG03:
-       vmovd    xmm1, rdi
-       vpxor    xmm0, xmm1, xmm0
+       vmovd    xmm0, rdi
+       vpxor    xmm0, xmm0, xmmword ptr [rax]
        jmp      G_M1352_IG07
        align    [0 bytes for IG05]
- 						;; size=13 bbWeight=0.50 PerfScore 2.17
+						;; size=13 bbWeight=0.50 PerfScore 3.50
 G_M1352_IG04:
+       vmovups  xmm0, xmmword ptr [rsi]
        vmovups  xmm1, xmmword ptr [rsi+0x10]

Nothing major, just a place where we don't share a load anymore.

In general the JIT ends up needing to create many additional impAppendStmt, Inlining Arg, spilled call-like call argument, and Inline ldloca(s) first use temp locals. Most of these end up as zero-ref and so there's potential for the JIT to optimize things better around such scenarios to avoid additional or unnecessary overhead (throughput wise).

Most of the corelib regressions are in the xplat APIs that don't have any accelerated implementation today, like Dot<short> (lack of handling for multiply since it would have been fairly complex IR to support). Others are cases like Vector512.OnesComplement which would be fixed if we had it defer to op_OnesComplement rather than Vector256.OnesComplement (an easy fix in a follow up PR).

All in all, I think we are in a position where taking this is feasible and it doesn't look to regress any meaningful scenarios, only the already unaccelerated edge cases. -- Mono is being left "as is" for now since their inliner isn't as robust as the RyuJIT one, but we should be able to independently investigate removing the Mono support for the xplat APIs as well so they only need to support the platform specific APIs.

CC. @dotnet/jit-contrib

EgorBo

Nice clean up in jit! 👍

LoopedBard3 · 2024-06-11T16:23:45Z

Likely regressions:
Windows x64: dotnet/perf-autofiling-issues#36075, dotnet/perf-autofiling-issues#36067
Linux x64: dotnet/perf-autofiling-issues#36054, dotnet/perf-autofiling-issues#36055
Windows arm64: dotnet/perf-autofiling-issues#36252, dotnet/perf-autofiling-issues#36251, dotnet/perf-autofiling-issues#36216, dotnet/perf-autofiling-issues#36215, dotnet/perf-autofiling-issues#36623
Linux arm64: dotnet/perf-autofiling-issues#36174

cc: @tannergooding

lewing · 2024-06-11T19:01:43Z

wasm regressions here dotnet/perf-autofiling-issues#36108 (aot) and improvements dotnet/perf-autofiling-issues#36093 (interp)

cc @radekdoulik

tannergooding · 2024-06-11T19:21:17Z

Likely regressions:

These should be resolved with #103177

wasm regressions here dotnet/perf-autofiling-issues#36108 (aot)

Little bit surprised that wasm aot regressed since the Mono handling hadn't been touched and the AOT support should have already been accelerated for these APIs. Is the support for PackedSIMD separate from the mainline Mono support around MonoLLVM?

improvements dotnet/perf-autofiling-issues#36093 (interp)

👍, fairly significant improvements at that. I know that interp hadn't accelerated the Vector2/3/4 APIs but had accelerated the Vector128 APIs, so lets it do less work and still get the benefits.

lewing · 2024-06-12T17:42:19Z

Likely regressions:

These should be resolved with #103177

wasm regressions here dotnet/perf-autofiling-issues#36108 (aot)

Little bit surprised that wasm aot regressed since the Mono handling hadn't been touched and the AOT support should have already been accelerated for these APIs. Is the support for PackedSIMD separate from the mainline Mono support around MonoLLVM?

I find it surprising too, @radekdoulik can you take a look at our codegen here please

improvements dotnet/perf-autofiling-issues#36093 (interp)

👍, fairly significant improvements at that. I know that interp hadn't accelerated the Vector2/3/4 APIs but had accelerated the Vector128 APIs, so lets it do less work and still get the benefits.

Yes, nice improvements

matouskozak · 2024-06-14T05:58:54Z

Possibly related noticeable size improvements on Mono iOS HelloWorld dotnet/perf-autofiling-issues#35768

tannergooding added 3 commits May 15, 2024 22:15

Change Vector4 to be implemented entirely in managed

603dc39

Change Plane to be implemented entirely in managed

c4bef1f

Change Quaternion to be implemented entirely in managed

10c5dfd

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 16, 2024

dotnet-policy-service bot assigned tannergooding May 16, 2024

This was referenced May 16, 2024

StackOverflowException on System.Runtime.Serialization.Xml.ReflectionOnly.Tests #99963

Closed

Test failure: Unknown chain building error in System.Net.Security.CertificateValidationRemoteServer.ConnectWithRevocation_WithCallback #101835

Closed

Avoid accidental recursion on Mono

bcd8926

build-analysis bot mentioned this pull request May 16, 2024

[browser][MT] WasmBrowserTestRunner.RunConsoleMessagesPump - abrupt disconect #101618

Closed

Change Vector2/3, Vector<T>, and Vector64/128/256/512<T> to be implem…

69b9d08

…ented in managed where trivially possible

tannergooding changed the title ~~Change Vector4, Quaternion, and Plane to be implemented entirely in managed~~ Change Vector2/3/4, Quaternion, Plane, Vector<T>, and Vector64/128/256/512<T> to be implemented in managed where trivially possible May 16, 2024

tannergooding added 2 commits May 16, 2024 14:23

Merge remote-tracking branch 'dotnet/main' into proto-102275

d93b621

Don't regress the implementation of Dot

2514560

This comment was marked as outdated.

Sign in to view

tannergooding closed this May 17, 2024

tannergooding reopened this Jun 3, 2024

Merge remote-tracking branch 'dotnet/main' into proto-102275

c993845

build-analysis bot mentioned this pull request Jun 3, 2024

Dead lettering tests #101524

Closed

tannergooding added 2 commits June 3, 2024 12:17

Fixing the handling of Vector512.CreateScalar

3eae1a6

Continue tracking the System.Numerics and System.Runtime.Intrinsics A…

6c6d71e

…PIs as intrinsic for the profitability boost

tannergooding force-pushed the proto-102275 branch from f7b27b2 to 1d6cafa Compare June 3, 2024 22:42

jkotas reviewed Jun 3, 2024

View reviewed changes

Don't use Unsafe.BitCast on Mono

4dae1fc

tannergooding force-pushed the proto-102275 branch from 1d6cafa to 4dae1fc Compare June 3, 2024 22:50

MihuBot mentioned this pull request Jun 5, 2024

[X64] [tannergooding] Change Vector2/3/4, Quaternion, Plane, Vector<T>, and Vector64/128/256/51 ... MihuBot/runtime-utils#400

Open

tannergooding marked this pull request as ready for review June 5, 2024 19:22

EgorBo approved these changes Jun 5, 2024

View reviewed changes

tannergooding merged commit 2a6bf6f into dotnet:main Jun 5, 2024
171 of 173 checks passed

This was referenced Jun 5, 2024

Minor cleanup of the Vector64/128/256/512 implementations to improve fallbacks #103095

Merged

Move more of the xplat hwintrinsic API implementation into managed code #103150

Closed

radekdoulik mentioned this pull request Jun 12, 2024

[wasm][perf] Tracking #96444

Open

kotlarmilos mentioned this pull request Jun 12, 2024

[Perf] Linux/x64: 15 Regressions on 6/5/2024 11:33:55 PM dotnet/perf-autofiling-issues#36078

Closed

github-actions bot locked and limited conversation to collaborators Jul 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change Vector2/3/4, Quaternion, Plane, Vector<T>, and Vector64/128/256/512<T> to be implemented in managed where trivially possible #102301

Change Vector2/3/4, Quaternion, Plane, Vector<T>, and Vector64/128/256/512<T> to be implemented in managed where trivially possible #102301

tannergooding commented May 16, 2024 •

edited

Loading

dotnet-policy-service bot commented May 16, 2024

This comment was marked as outdated.

tannergooding commented Jun 3, 2024

tannergooding commented Jun 3, 2024

jkotas Jun 3, 2024

tannergooding Jun 3, 2024

tannergooding commented Jun 5, 2024

tannergooding commented Jun 5, 2024 •

edited

Loading

EgorBo left a comment

LoopedBard3 commented Jun 11, 2024 •

edited

Loading

lewing commented Jun 11, 2024 •

edited

Loading

tannergooding commented Jun 11, 2024

lewing commented Jun 12, 2024

matouskozak commented Jun 14, 2024 •

edited

Loading

Change Vector2/3/4, Quaternion, Plane, Vector<T>, and Vector64/128/256/512<T> to be implemented in managed where trivially possible #102301

Change Vector2/3/4, Quaternion, Plane, Vector<T>, and Vector64/128/256/512<T> to be implemented in managed where trivially possible #102301

Conversation

tannergooding commented May 16, 2024 • edited Loading

dotnet-policy-service bot commented May 16, 2024

This comment was marked as outdated.

tannergooding commented Jun 3, 2024

tannergooding commented Jun 3, 2024

jkotas Jun 3, 2024

Choose a reason for hiding this comment

tannergooding Jun 3, 2024

Choose a reason for hiding this comment

tannergooding commented Jun 5, 2024

tannergooding commented Jun 5, 2024 • edited Loading

EgorBo left a comment

Choose a reason for hiding this comment

LoopedBard3 commented Jun 11, 2024 • edited Loading

lewing commented Jun 11, 2024 • edited Loading

tannergooding commented Jun 11, 2024

lewing commented Jun 12, 2024

matouskozak commented Jun 14, 2024 • edited Loading

tannergooding commented May 16, 2024 •

edited

Loading

tannergooding commented Jun 5, 2024 •

edited

Loading

LoopedBard3 commented Jun 11, 2024 •

edited

Loading

lewing commented Jun 11, 2024 •

edited

Loading

matouskozak commented Jun 14, 2024 •

edited

Loading