Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guarded Devirt: multiple type checks #59604

Closed
wants to merge 29 commits into from
Closed

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Sep 25, 2021

This PR extends GDV (Guarded Devirtualization) to support multiple type-checks (see #58984 comments ) instead of current "a check for the most popular type and a fallback for everything else".

Plan:

  • Fix bb weights
  • Make secondary cases inlineable
  • Test with unbox entries (valuetypes implementing interfaces)
  • Test with generics
  • Test with chained GDV calls
  • Test cases where some candidates aren't inlineable/devirtualizeable (but not all)
  • Validate performance impact on aspnet/TE benchmarks
  • Optional: Merge type checks if they point to the same method
  • Add unit tests

Examples (click to zoom)

No PGO (e.g. .NET 5.0):

image

PGO (main):

image

PGO (this PR):

image

Flowgraph (after INDXCALL phase):
image

cc @AndyAyersMS

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 25, 2021
@ghost
Copy link

ghost commented Sep 25, 2021

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR extends GDV to support multiple type-checks. For now secondary guesses aren't inlineable yet (but devirtualize-able).
Example:

using System;
using System.Runtime.CompilerServices;
using System.Threading;

class ClassA
{
    public virtual int DoWork() => 42;
}

class ClassB : ClassA
{
    public override int DoWork() => 43;
}

class ClassC : ClassB
{
    public override int DoWork() => 44;
}

public class Program
{
    public static void Main()
    {
        for (int i = 0; i < 200; i++)
        {
            Test(new ClassC());
            Test(new ClassB());
            Test(new ClassA());
            Thread.Sleep(10);
        }

        Console.ReadKey();
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    static int Test(ClassA a) => a.DoWork();
}

Tier1 codegen for Test with DOTNET_DynamicPGO=1 and DOTNET_JitGuardedDevirtualizationCheckCount=3:

; Assembly listing for method Program:Test(ClassA):int
G_M42821_IG01:              ;; offset=0000H
       4883EC28             sub      rsp, 40
                                                ;; bbWeight=1    PerfScore 0.25
G_M42821_IG02:              ;; offset=0004H
       48B8A80459D6FD7F0000 mov      rax, 0x7FFDD65904A8      ; ClassC
       483901               cmp      qword ptr [rcx], rax
       7507                 jne      SHORT G_M42821_IG04
                                                ;; bbWeight=1    PerfScore 4.25
G_M42821_IG03:              ;; offset=0013H
       B82C000000           mov      eax, 44
       EB38                 jmp      SHORT G_M42821_IG09
                                                ;; bbWeight=0.37 PerfScore 0.83
G_M42821_IG04:              ;; offset=001AH
       48B8000359D6FD7F0000 mov      rax, 0x7FFDD6590300      ; ClassB
       483901               cmp      qword ptr [rcx], rax
       7508                 jne      SHORT G_M42821_IG06
                                                ;; bbWeight=0.18 PerfScore 0.79
G_M42821_IG05:              ;; offset=0029H
       FF15611C4100         call     [ClassB:DoWork():int:this]
       EB21                 jmp      SHORT G_M42821_IG09
                                                ;; bbWeight=0.37 PerfScore 1.85
G_M42821_IG06:              ;; offset=0031H
       48B8580159D6FD7F0000 mov      rax, 0x7FFDD6590158      ; ClassA
       483901               cmp      qword ptr [rcx], rax
       7508                 jne      SHORT G_M42821_IG08
                                                ;; bbWeight=0.18 PerfScore 0.79
G_M42821_IG07:              ;; offset=0040H
       FF15A21A4100         call     [ClassA:DoWork():int:this]
       EB0A                 jmp      SHORT G_M42821_IG09
                                                ;; bbWeight=0.25 PerfScore 1.25
G_M42821_IG08:              ;; offset=0048H
       488B01               mov      rax, qword ptr [rcx]
       488B4048             mov      rax, qword ptr [rax+72]
       FF5020               call     qword ptr [rax+32]ClassA:DoWork():int:this
                                                ;; bbWeight=0.01 PerfScore 0.07
G_M42821_IG09:              ;; offset=0052H
       90                   nop
                                                ;; bbWeight=1    PerfScore 0.25
G_M42821_IG10:              ;; offset=0053H
       4883C428             add      rsp, 40
       C3                   ret
                                                ;; bbWeight=1    PerfScore 1.25

; Total bytes of code 88, prolog size 4, PerfScore 20.38, instruction count 22, allocated bytes for code 88 (MethodHash=afda58ba) for method Program:Test(ClassA):int
; ============================================================

cc @AndyAyersMS

Author: EgorBo
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@AndyAyersMS
Copy link
Member

I wonder if we should take this opportunity to clean up how we're using the fields in GT_CALL. What's there now is somewhat hacky (eg we have to save/restore the stub address since it can be "live" for GDV candidates).

@EgorBo
Copy link
Member Author

EgorBo commented Oct 3, 2021

/azp run runtime-coreclr pgo

@dotnet dotnet deleted a comment from azure-pipelines bot Oct 3, 2021
@dotnet dotnet deleted a comment from azure-pipelines bot Oct 3, 2021
@EgorBo EgorBo marked this pull request as ready for review October 3, 2021 22:20
@EgorBo
Copy link
Member Author

EgorBo commented Oct 3, 2021

/azp run runtime-coreclr pgo

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@EgorBo
Copy link
Member Author

EgorBo commented Oct 4, 2021

/azp run runtime-coreclr pgo

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@EgorBo
Copy link
Member Author

EgorBo commented Oct 4, 2021

@AndyAyersMS this is ready for review now, it passes all my tests and works on complicated apps
will schedule some benchmarks this week

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few other questions...

  • What does this look like in the inline tree display?
  • Are we going to report multiple inline failures to the VM for GDV inline attempts?

If we devirtualize multiple candidates to the same method it would be nice (perhaps) for them to share the call (downside might be we lose exact type in that call).

We should think about what makes sense for chaining here too... perhaps it is only interesting if we're stringing together monomorphic GDVs.

@@ -4795,6 +4806,18 @@ struct GenTreeCall final : public GenTree
IL_OFFSET gtRawILOffset;
#endif // defined(DEBUG) || defined(INLINE_DATA)

UINT8 gtGDVCandidatesCount;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this make a call node bigger?

If we have room for this, do we have enough room to make the union above a tagged union?

Copy link
Member Author

@EgorBo EgorBo Oct 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AndyAyersMS Not sure I follow, here is the current win-x64 layout (before my changes) in Release /d1reportSingleClassLayoutGenTreeCall:

4>class GenTreeCall	size(144):
4>	+---
4> 0	| +--- (base class GenTree)
4> 0	| | genTreeOps gtOper
4> 1	| | var_types gtType
4> 2	| | gtCSEnum
4> 3	| | gtLIRFlags
4> 4	| | AssertionInfo gtAssertionInfo
4> 6	| | _gtCostEx
4> 7	| | _gtCostSz
4> 8	| | _gtRegNum
4>  	| | <alignment member> (size=3)
4>12	| | GenTreeFlags gtFlags
4>16	| | ValueNumPair gtVNPair
4>24	| | gtRsvdRegs
4>  	| | <alignment member> (size=4)
4>32	| | gtNext
4>40	| | gtPrev
4>	| +---
4>48	| gtCallThisArg
4>56	| gtCallArgs
4>64	| gtCallLateArgs
4>72	| fgArgInfo
4>80	| tailCallInfo
4>80	| CorInfoCallConvExtension unmgdCallConv
4>88	| GenTreeCallFlags gtCallMoreFlags
4>92.	| gtCallType (bitstart=0,nbits=3)
4>92.	| gtReturnType (bitstart=3,nbits=5)
4>  	| <alignment member> (size=3)
4>96	| gtRetClsHnd
4>104	| gtCallCookie
4>104	| gtInlineCandidateInfo
4>104	| gtClassProfileCandidateInfo
4>104	| gtStubCallStubAddr
4>104	| compileTimeHelperArgumentHandle
4>104	| gtDirectCallAddress
4>112	| gtControlExpr
4>120	| gtCallMethHnd
4>120	| gtCallAddr
4>128	| CORINFO_CONST_LOOKUP gtEntryPoint

in theory we can fit the count into those spare 3 bits? (means maxvalue is 7) if the size matters that much - not sure about impact on other archs/os

src/coreclr/jit/importer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/importer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/importer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/importer.cpp Show resolved Hide resolved
src/coreclr/jit/importer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/indirectcalltransformer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/indirectcalltransformer.cpp Show resolved Hide resolved
src/coreclr/jit/indirectcalltransformer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/indirectcalltransformer.cpp Outdated Show resolved Hide resolved
@JulieLeeMSFT JulieLeeMSFT added this to the 7.0.0 milestone Nov 1, 2021
@JulieLeeMSFT
Copy link
Member

@EgorBo this PR will be closed when you finish all the remaining work items?

  • Validate performance impact on aspnet/TE benchmarks
  • Optional: Merge type checks if they point to the same method
  • Add unit tests

@EgorBo
Copy link
Member Author

EgorBo commented Nov 1, 2021

@EgorBo this PR will be closed when you finish all the remaining work items?

  • Validate performance impact on aspnet/TE benchmarks
  • Optional: Merge type checks if they point to the same method
  • Add unit tests

Right, I need some real-world benchmark to prove it makes sense + most likely I need to implement "Chaining" optimization here, it works today for single-guess GDV but in my PR it's disabled for multi-guess calls so there is some overhead.

@JulieLeeMSFT
Copy link
Member

This PR is WIP.

@EgorBo
Copy link
Member Author

EgorBo commented Jan 15, 2022

Closing for now to keep number of open PR smaller, going to return to it

@EgorBo EgorBo closed this Jan 15, 2022
@EgorBo EgorBo reopened this Jan 29, 2022
@EgorBo EgorBo closed this Jan 29, 2022
@EgorBo EgorBo mentioned this pull request Jan 29, 2022
2 tasks
@ghost ghost locked as resolved and limited conversation to collaborators Mar 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants