Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: optimize redundant branches by looking through phis #76283

Merged
merged 3 commits into from
Sep 29, 2022

Conversation

AndyAyersMS
Copy link
Member

@AndyAyersMS AndyAyersMS commented Sep 28, 2022

In some cases the value of a block's branch predicate is correlated with the predecessor of the block. Often this correlation is hinted at by the presence of phis in the predicate's tree and/or phi VNs in in the predicate's VN graph.

For each predecessor of a block, we evaluate the predicate value number using the values brought to the block by that predecessor. If we find correlations, we use them to drive the existing jump threading optimization.

Also, if we end up partially disambiguating such that there is just one remaining predecessor, update the value number of the predicate to reflect the values that flow in from that predecessor.

Fixes #75987.
Contributes to #48115.

Diffs

In some cases the value of a block's branch predicate is correlated with the
predecessor of the block. Often this correlation is hinted at by the presence
of phis in the predicate's tree and/or phi VNs in in the predicate's VN graph.

For each predecessor of a block, we evaluate the predicate value number using
the values brought to the block by that predecessor. If we find correlations,
we use them to drive the existing jump threading optimization.

Also, if we end up partially disambiguating such that there is just one
remaining predecessor, update the value number of the predicate to reflect
the values that flow in from that predecessor.

Fixes dotnet#75944.
Contributes to dotnet#48115.
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 28, 2022
@ghost ghost assigned AndyAyersMS Sep 28, 2022
@ghost
Copy link

ghost commented Sep 28, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

In some cases the value of a block's branch predicate is correlated with the predecessor of the block. Often this correlation is hinted at by the presence of phis in the predicate's tree and/or phi VNs in in the predicate's VN graph.

For each predecessor of a block, we evaluate the predicate value number using the values brought to the block by that predecessor. If we find correlations, we use them to drive the existing jump threading optimization.

Also, if we end up partially disambiguating such that there is just one remaining predecessor, update the value number of the predicate to reflect the values that flow in from that predecessor.

Fixes #75987.
Contributes to #48115.

Author: AndyAyersMS
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@AndyAyersMS
Copy link
Member Author

@EgorBo PTAL
cc @dotnet/jit-contrib

Expecting a fairly large number of diffs. Looked at some of the bigger regressions and they seem to be changes in LSRA.

@jakobbotsch
Copy link
Member

/azp run Fuzzlyn

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@EgorBo
Copy link
Member

EgorBo commented Sep 28, 2022

Are failures related? Diffs are nice indeed 🚀

@AndyAyersMS
Copy link
Member Author

Are failures related? Diffs are nice indeed 🚀

I think so. Debugging now.

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Sep 28, 2022

Are failures related? Diffs are nice indeed 🚀

I think so. Debugging now.

Fixed that but looks like it has uncovered other problems.

@AndyAyersMS
Copy link
Member Author

Are failures related? Diffs are nice indeed 🚀

I think so. Debugging now.

Fixed that but looks like it has uncovered other problems.

Oops, the fix was wrong, PhiDef VNs have some unusual argument encodings.

different SSA def numbers, despite the fact that we don't allow overlapping
SSA lifetimes. For example there may be a Phi in another block that gets
copied to some other local and then reaches the current block via that local.

So make sure that when we search local PHIs we also match the ssa def
number to ensure we're looking at the right PHIs.
@AndyAyersMS
Copy link
Member Author

Are failures related? Diffs are nice indeed 🚀

I think so. Debugging now.

Fixed that but looks like it has uncovered other problems.

Oops, the fix was wrong, PhiDef VNs have some unusual argument encodings.

The issue was that both relop operands had PhiDef VNs for V04, but only one of them matched the local PHI; the other came from a "nonlocal" PHI (via a copy to another local). So when scanning for the right local PHI we need to match both the local num and the SSA def num we get from the PhiDef VN.

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Sep 29, 2022

The condition for the bug is (seemingly) pretty rare, so I don't expect the proper fix to perturb the diffs much.

Code for M1 in #75987 is now identical to M2... in both cases we needlessly build frames as we don't realize we have eliminated all the calls.

; Assembly listing for method X:M1(System.Object)
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
;* V00 arg0         [V00,T00] (  0,  0   )     ref  ->  zero-ref    class-hnd
;  V01 OutArgs      [V01    ] (  1,  1   )  lclBlk (32) [rsp+00H]   "OutgoingArgSpace"
;* V02 tmp1         [V02,T01] (  0,  0   )     ref  ->  zero-ref   
;* V03 tmp2         [V03    ] (  0,  0   )     ref  ->  zero-ref    class-hnd exact single-def "NewObj constructor temp"
;* V04 tmp3         [V04,T02] (  0,  0   )     ref  ->  zero-ref    class-hnd exact single-def "NewObj constructor temp"
;
; Lcl frame size = 40

G_M9243_IG01:              ;; offset=0000H
       4883EC28             sub      rsp, 40
						;; size=4 bbWeight=1    PerfScore 0.25
G_M9243_IG02:              ;; offset=0004H
       4883C428             add      rsp, 40
       C3                   ret      
						;; size=5 bbWeight=1    PerfScore 1.25

Getting to this point required all of the tricks here and in #76207, in particular the fall through pred conversion / VN update / retry if block is still BBJ_COND. As seen below....

--- Trying RBO in BB03 ---
Relop [000011] BB03 value unknown, trying inference
... JT-PHI [interestingVN] in BB03 relop first operand VN is PhiDef for V02
N003 (  5,  4) [000011] J------N---                         *  NE        int    $101
N001 (  3,  2) [000009] -----------                         +--*  LCL_VAR   ref    V02 tmp1         u:2 (last use) $1c0
N002 (  1,  1) [000010] -----------                         \--*  CNS_INT   ref    null $VN.Null
... substituting ($80,$0) for ($1c0,$0) in $101 gives $100
BB01 is an ambiguous pred
... substituting ($180,$0) for ($1c0,$0) in $101 gives $42
... substituted VN implies relop is 1 when coming from pred BB02
BB02 is a true pred
BB02 is the fall-through pred
BB03 has ambiguous preds and a (true) fall through pred and no (false) preds.
Converting fall through pred BB02 to BBJ_ALWAYS
Optimizing via jump threading
Jump flow from pred BB02 -> BB03 implies predicate true; we can safely redirect flow to be BB02 -> BB05
Setting edge weights for BB02 -> BB05 to [0 .. 3.402823e+38]
BB03 has just one remaining predecessor BB01
Updating [000011] liberal VN from $101 to $100
Will retry RBO in BB03 after partial optimization

--- Trying RBO in BB03 ---
Relop [000011] BB03 value unknown, trying inference

Dominator BB01 of BB03 has relop with same liberal VN
N003 (  3,  3) [000003] J------N---                         *  NE        int    $100
N001 (  1,  1) [000001] -----------                         +--*  LCL_VAR   ref    V02 tmp1         u:1 (last use) $80
N002 (  1,  1) [000002] -----------                         \--*  CNS_INT   ref    null $VN.Null
 Redundant compare; current relop:
N003 (  5,  4) [000011] J------N---                         *  NE        int    <l:$100, c:$101>
N001 (  3,  2) [000009] -----------                         +--*  LCL_VAR   ref    V02 tmp1         u:2 (last use) $1c0
N002 (  1,  1) [000010] -----------                         \--*  CNS_INT   ref    null $VN.Null
Jump successor BB03 of BB01 reaches, relop [000011] must be true

Redundant branch opt in BB03:

removing useless STMT00002 ( ??? ... 0x00D )
N004 (  7,  6) [000012] -----------                         *  JTRUE     void   $VN.Void
N003 (  5,  4) [000011] -----------                         \--*  CNS_INT   int    1
 from BB03

Conditional folded at BB03
BB03 becomes a BBJ_ALWAYS to BB05
Compiler::optRedundantBranch removed tree:
N004 (  7,  6) [000012] -----------                         *  JTRUE     void   $VN.Void
N003 (  5,  4) [000011] -----------                         \--*  CNS_INT   int    1

@EgorBo
Copy link
Member

EgorBo commented Sep 29, 2022

@AndyAyersMS Does it improve codegen for this:

static bool Test(object obj) => obj is int;

In current Main seems to be emitting:

; Method Proga:Test(System.Object):bool
G_M55190_IG01:              
G_M55190_IG02:              
       4885C9               test     rcx, rcx
       7411                 je       SHORT G_M55190_IG05
G_M55190_IG03:              
       48B8E0B8A064FE7F0000 mov      rax, 0x7FFE64A0B8E0      ; System.Int32
       483901               cmp      qword ptr [rcx], rax
       7402                 je       SHORT G_M55190_IG05
G_M55190_IG04:              
       33C9                 xor      rcx, rcx
G_M55190_IG05:              
       33C0                 xor      eax, eax
       4885C9               test     rcx, rcx
       0F95C0               setne    al
G_M55190_IG06:              
       C3                   ret      
; Total bytes of code: 31

@EgorBo
Copy link
Member

EgorBo commented Sep 29, 2022

If it does it might close some of these:
#36649
#47920
#55841

@AndyAyersMS
Copy link
Member Author

@AndyAyersMS btw, does it improve codegen for this:

No, that is the "side effect" case which we still can't handle.

@AndyAyersMS
Copy link
Member Author

@AndyAyersMS btw, does it improve codegen for this:

No, that is the "side effect" case which we still can't handle.

You might recall handling this requires some code duplication and probably an SSA update. Likely the next thing to try and fix.

@EgorBo
Copy link
Member

EgorBo commented Sep 29, 2022

@AndyAyersMS Ah I see. By the way, when we have a PHI node and one of the preds is another PHI - does this PR handle that or give up?

@AndyAyersMS
Copy link
Member Author

@AndyAyersMS Ah I see. By the way, when we have a PHI node and one of the preds is another PHI - does this PR handle that or give up?

We give up since both those values arrive via the same pred edge. That is, given something like (GT (PHI (PHI(a, b), c)), 0) the chances that both a and b lead to the same overall VN when substituted seems small. Also, you have to start worrying about cycles in the SSA graph.

@AndyAyersMS
Copy link
Member Author

@EgorBo the bug is fixed.

Copy link
Member

@EgorBo EgorBo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad that the diffs remain 🙂

@AndyAyersMS AndyAyersMS merged commit 35af395 into dotnet:main Sep 29, 2022
ValueNum phiArgVN = phiArgNode->GetVN(VNK_Liberal);

// We sometimes see cases where phi args do not have VNs.
// (I recall seeing this before... track down why)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

track down why

It's because we don't number PHI_ARGs which depend on a cycle (i. e. represent defs from blocks we haven't numbered yet).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JIT does not eliminate null check for "o ??= new()"
4 participants