-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issues with Range type #11848
Comments
Seems relatively simple to add the right alignment check and allow this case through. Prospective fix for that and the inlining and IL issues: master...AndyAyersMS:Range |
Jit-diffs from the extra promotion:
|
Promotion BlockersAs noted above Inline BlockersThe
We can force inlining here by applying Enregistration Blockers:
|
This seems easy to fix, that assignment could be transformed into two individual field assignments. I experimented with this kind of stuff in |
FWIW That same construct knocks out V05 later on in morph... so fixing it early would unblock both DNERs.
|
Yeah, the promotion story is pretty weird. Some happens in Fixing it properly needs quite a bit of work but for now we could probably do something in |
Tried something in a hurry: mikedn/coreclr@aa2ab99#diff-bd2d23afdb1936d1982f699f760429d2 It generates this: G_M39430_IG01:
4883EC28 sub rsp, 40
90 nop
G_M39430_IG02:
488B02 mov rax, bword ptr [rdx]
8B5208 mov edx, dword ptr [rdx+8]
41B801000000 mov r8d, 1
41B9FEFFFFFF mov r9d, -2
4533D2 xor r10d, r10d
4489542420 mov dword ptr [rsp+20H], r10d
4489542424 mov dword ptr [rsp+24H], r10d
4489442420 mov dword ptr [rsp+20H], r8d
44894C2424 mov dword ptr [rsp+24H], r9d
41B801000000 mov r8d, 1
41B901000000 mov r9d, 1
448BD2 mov r10d, edx
452BD1 sub r10d, r9d
458BCA mov r9d, r10d
452BC8 sub r9d, r8d
458BD0 mov r10d, r8d
458BD9 mov r11d, r9d
4D03D3 add r10, r11
8BD2 mov edx, edx
4C3BD2 cmp r10, rdx
7716 ja SHORT G_M39430_IG05
G_M39430_IG03:
4963D0 movsxd rdx, r8d
488D0490 lea rax, bword ptr [rax+4*rdx]
488901 mov bword ptr [rcx], rax
44894908 mov dword ptr [rcx+8], r9d
488BC1 mov rax, rcx
G_M39430_IG04:
4883C428 add rsp, 40
C3 ret
G_M39430_IG05:
E8D797FEFF call ThrowHelper:ThrowArgumentOutOfRangeException()
CC int3 There's still bad stuff to investigate but I'm not sure if I'll have time for this earlier than Friday. |
Those stack stores are probably caused by variables still being not enregistered. I'll have to check but I'm pretty sure that |
Yeah, G_M55904_IG01:
4883EC28 sub rsp, 40
90 nop
G_M55904_IG02:
488B02 mov rax, bword ptr [rdx]
8B5208 mov edx, dword ptr [rdx+8]
41B801000000 mov r8d, 1
41B901000000 mov r9d, 1
448BD2 mov r10d, edx
452BD1 sub r10d, r9d
458BCA mov r9d, r10d
452BC8 sub r9d, r8d
458BD0 mov r10d, r8d
458BD9 mov r11d, r9d
4D03D3 add r10, r11
8BD2 mov edx, edx
4C3BD2 cmp r10, rdx
7716 ja SHORT G_M55904_IG05
G_M55904_IG03:
4963D0 movsxd rdx, r8d
488D0490 lea rax, bword ptr [rax+4*rdx]
488901 mov bword ptr [rcx], rax
44894908 mov dword ptr [rcx+8], r9d
488BC1 mov rax, rcx
G_M55904_IG04:
4883C428 add rsp, 40
C3 ret
G_M55904_IG05:
E89A84FDFF call ThrowHelper:ThrowArgumentOutOfRangeException()
CC int3 Hrm, something's still not right, those 1 constants are fishy... |
Eh, those constants aren't propagated because there are PHIs in the way. This is going to be fun. |
After hacking VN to do some kind of SCCP I get this: G_M55904_IG01:
4883EC28 sub rsp, 40
90 nop
G_M55904_IG02:
488B02 mov rax, bword ptr [rdx]
8B5208 mov edx, dword ptr [rdx+8]
448BC2 mov r8d, edx
4183E801 sub r8d, 1
4183E801 sub r8d, 1
458BC8 mov r9d, r8d
49FFC1 inc r9
8BD2 mov edx, edx
4C3BCA cmp r9, rdx
7713 ja SHORT G_M55904_IG05
G_M55904_IG03:
4883C004 add rax, 4
488901 mov bword ptr [rcx], rax
44894108 mov dword ptr [rcx+8], r8d
488BC1 mov rax, rcx
G_M55904_IG04:
4883C428 add rsp, 40
C3 ret
G_M55904_IG05:
E86DADFEFF call ThrowHelper:ThrowArgumentOutOfRangeException()
CC int3 Close. Except for the 2 consecutive [MethodImpl(MethodImplOptions.NoInlining)]
static Span<int> TrimFirstLast_OpenCoded(Span<int> s) => s.Slice(1, s.Length - 2); Well, I suppose we could also do it the other way around - change the range version to match the non-range version. But since we have this interesting I don't think we have anything in the JIT today that can deal with this. I suppose morph might do it, if the 2 subtract nodes would be in the same tree. They are not. So... now we need forward substitution too. |
Thanks for foraging ahead on this. Always happy to see SCCP make an appearance. |
Eh, to be clear about what I did - it should be similar in spirit to SCCP but it's a rather different implementation:
It works for this example. It doesn't work in the general case and I've yet to figure out why. One likely cause is missing PHI args, the issue that came up in the PHI use list PR. |
Turns out that this is not so simple. The forward substitution stuff I'm experimenting with happens right after SSA, where the "is single use" property is up to date. At that point we have PHIs that are in the way and block the simplification of that chain of subtracts. The Ideally we'd maintain the "is single use" property through transforms but there's a lot of work to do before this becomes practical. So we may need to approach this differently for now. It would be better if we could avoid creating those PHIs in the first place - local assertion propagation could help with that by getting rid of constant branches. Apparently this doesn't work now because we end up with a sequence of blocks that could be compacted into one but that only happens after morph. Needs more investigation. But for a start we should probably deal with struct promotion. We won't get very far without that For completeness, #4323 shows a similar case where early introduced ABI details cause issues. In that case a |
Thanks again -- hope to have time to get back to some of this soon. I'd really like to see modelling of most or all the ABI details deferred. Things are currently very messy during inlining. The struct->int retyping for small struct returns sure seems like it could wait, and could be entirely avoided if we inline. cc @CarolEidt |
Probably a stretch to promise any further improvements for 3.0, but am going to take another look. |
May decide to push the promotion bit forward (that is, allowing Ranges to be promoted, though as noted above, not enregistered) as this at least hits modestly often. Updated diffs (based at fae2a56)
Beyond that the conditional assignments to the underlying index fields inhibit our ability to reason about the values of those fields, even if we ultimately trim away the conditions and "know" that we have a from start or from end index. A sparse CCP like approach might help but it almost feels like we'd need to integrate VN and assertion prop together to get the real benefit, as assertion prop can allow us to improve VNs, and improved VNs may allow us more use of assertions. But since updates are stylized perhaps running them together is not really needed. No new VNs are created -- all edge pruning does is setup or break equivalences among VNs. If we go for some sparse eval scheme in assertion prop, we have to decide if we're going to approach this from an optimistic (no blocks reachable until proven otherwise) or pessimistic (all blocks reachable until proven otherwise) viewpoint. Pessimistic is possibly more appropriate for the jit so we can bail at any point and not worry about being incorrect. So one off-the-cuff idea here is to VN as we do now, and track PHI input and output VNs as candidates for a union-find style algorithm... initially assume all these are distinct VNs but as we are able to prune SSA edges away we will create degenerate PHIs that let us merge some inputs and outputs. Would need some level of indirection in VN access, so would keep some kind of VN rename table that maps the full set of VNs into canonical leader VNs. We'd also need to know which preds supply which PHI inputs (and keep a full roster of all pred inputs, or a moral equivalent). |
For a while now the jit has been able to promote an outer struct A with an inner struct field B that itself has a single non-struct field C, provided that C occupies all of B and that C and B are pointer-sized. For example, this comes up when supporting promotion of `Span<T>`, as a span contains a `ByReference<T>` field that itself contains a pointer-sized field. This change relaxes the constraints slightly, allowing B and C to be less than pointer sized, provided C still occupies all of B, and B is suitably aligned within A. Doing so allows promotion of the new `Range` type, which contains two `Index` fields that each wrap an `int`. This improves performance for uses of `Range` for simple examples like those in #22079.
…lds (#22867) For a while now the jit has been able to promote an outer struct A with an inner struct field B that itself has a single non-struct field C, provided that C occupies all of B and that C and B are pointer-sized. For example, this comes up when supporting promotion of `Span<T>`, as a span contains a `ByReference<T>` field that itself contains a pointer-sized field. This change relaxes the constraints slightly, allowing B and C to be less than pointer sized, provided C still occupies all of B, and B is suitably aligned within A. Doing so allows promotion of the new `Range` type, which contains two `Index` fields that each wrap an `int`. This improves performance for uses of `Range` for simple examples like those in #22079.
Made some preliminary improvements in dotnet/coreclr#22867. Remainder will have to wait. |
@sandreenko wondering if your recent work on small struct enregistration has fixed the last issue here. Can you take a look? |
Will do. |
I think there was a type in the range implementation, should not it be
instead of
? If yes then both methods now have
when the second
The only difference is in this instruction:
because in the second method when we come to global morph and try to do All range structs are promoted as expected, I think the issue could be closed. |
@sandreenko says:
@AndyAyersMS do you agree? If not, what is left to do? (And should this issue be used to track such things?) |
Seems like the most serious problems are now fixed, but there are some fit and finish issues. dotnet/roslyn#47629 to shows we still don't handle ranges as well as we could and includes a few tests we should explore. |
Just double checked and yes, this is the diff between |
And these are identical (though don't use Range)... that's great (and great progress). Thanks! |
Interesting, I haven't looked at this for a while, so am pleasantly surprised.
I think we can close it. I suspect some of the linked issues may still be relevant. |
Ideally the following two methods should generate similar code.
Currently the
Range
-based version runs into a number of issues:Range
s are not promotable (fixed via JIT: allow slightly more general promotion of structs with struct fields coreclr#22867)Index
constructor is blocking enregistration of fields (similar byref on IL stack across branch exposure issue as we saw in Improve perf for Index based span indexers coreclr#21196) (fixed in Index and Range updates coreclr#22331)Range
enregistration (8 byte structs are returned by-value, so jit “retypes” the two adjacent ints in a Range as a long).cc @stephentoub
Will fill in more details later.
Marking as 3.0 for now.
category:cq
theme:optimization
skill-level:expert
cost:large
The text was updated successfully, but these errors were encountered: