Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Return multiple likely classes in getLikelyClass (for better GDV) #58984

Merged
merged 27 commits into from
Sep 24, 2021

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Sep 11, 2021

This PR extends getLikelyClass to return multiple likely classes like for the following sample:

using System;
using System.Runtime.CompilerServices;
using System.Threading;

class ClassA
{
    public virtual int DoWork() => 42;
}

class ClassB : ClassA { }
class ClassC : ClassB { }

public class Program
{
    public static void Main()
    {
        for (int i = 0; i < 1000; i++)
        {
            if (i % 5 == 0)
                Test(new ClassC());
            else if (i % 3 == 0)
                Test(new ClassB());
            else
                Test(new ClassA());
            Thread.Sleep(5);
        }

        Console.ReadKey();
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    static int Test(ClassA? a) => a?.DoWork() ?? 0;
}

When it tries to devirtualize a.DoWork() it now prints:

Considering guarded devirtualization at IL offset 6 (0x6)
Likely classes for 00007FFBE5DC2C10 (ClassA):
  1) 00007FFBE5DC2C10 (ClassA) [likelihood:62%]
  2) 00007FFBE5DC3580 (ClassC) [likelihood:25%]
  3) 00007FFBE5DC3438 (ClassB) [likelihood:12%]
virtual call would invoke method DoWork
Marking call [000005] as guarded devirtualization candidate; will guess for class ClassA

I'm locally playing with it to do multiple type checks for GDV like:

if (obj is A) 
    tmp = (A)a.DoWork();
else if (obj is B)
    tmp = (B)b.DoWork();
else if (obj is C)
    tmp = (C)c.DoWork(); 
else
    obj.DoWork(); // non-devirtualized fallback

or merge multiple checks if they point to the same method (it means DoWork is not overridden in B)

if (obj is A || obj is B)
    tmp = (A)a.DoWork();
else
    obj.DoWork(); // non-devirtualized fallback

Btw, here is some statistics I've harvested on AvaloniaILSpy application in FullPGO mode:
image
(So only 57% of virtual calls with class-probes data are 100% monomorphic in this app)

image

image

/cc @dotnet/jit-contrib @AndyAyersMS @jakobbotsch

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 11, 2021
@ghost
Copy link

ghost commented Sep 11, 2021

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR extends getLikelyClass to return multiple likely classes like for the following sample:

using System;
using System.Runtime.CompilerServices;
using System.Threading;

class ClassA
{
    public virtual int DoWork() => 42;
}

class ClassB : ClassA { }
class ClassC : ClassB { }

public class Program
{
    public static void Main()
    {
        for (int i = 0; i < 1000; i++)
        {
            if (i % 5 == 0)
                Test(new ClassC());
            else if (i % 3 == 0)
                Test(new ClassB());
            else
                Test(new ClassA());
            Thread.Sleep(5);
        }

        Console.ReadKey();
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    static int Test(ClassA? a) => a?.DoWork() ?? 0;
}

When it tries to devirtualize a.DoWork() it now prints:

Considering guarded devirtualization at IL offset 6 (0x6)
Likely classes for 00007FFBE5DC2C10 (ClassA):
  1) 00007FFBE5DC2C10 (ClassA) [likelihood:62%]
  2) 00007FFBE5DC3580 (ClassC) [likelihood:25%]
  3) 00007FFBE5DC3438 (ClassB) [likelihood:12%]
virtual call would invoke method DoWork
Marking call [000005] as guarded devirtualization candidate; will guess for class ClassA

I'm locally playing with it to do multiple type checks for GDV like:

if (obj is A) 
    tmp = (A)a.DoWork();
else if (obj is B)
    tmp = (B)b.DoWork();
else if (obj is C)
    tmp = (C)c.DoWork(); 
else
    obj.DoWork(); // non-devirtualized fallback

or merge multiple checks if they point to the same method (it means DoWork is not overridden in B)

if (obj is A || obj is B)
    tmp = (A)a.DoWork();
else
    obj.DoWork(); // non-devirtualized fallback

Btw, here is some statistics I've harvested on AvaloniaILSpy application in FullPGO mode:
image

so non-monomorphic callsites aren't that rare as I thought.

/cc @dotnet/jit-contrib @AndyAyersMS @jakobbotsch

Author: EgorBo
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@EgorBo
Copy link
Member Author

EgorBo commented Sep 11, 2021

hm.. where should I rename getLikelyClass for clrjitilc, is it in dotnet/runtimelab?

@AndyAyersMS
Copy link
Member

It will be interesting to see how this plays out.

Couple of notes:

  • We currently only have 8 slots in the class histogram table, so the non-dominant class probabilities are likely subject to greater relative error (e.g. the smallest non-zero likelihood you will see is 12%). I wonder if we should increase the size here so we get a more accurate picture? I didn't do this originally because there ends up being a lot of wasted table space (since most sites are monomorphic) and there's no simple way to size the tables dynamically.
  • The table compression done for static PGO will need revision, either allowing for MAX_LIKELY_CLASSES or a family of records that can describe more info -- we want to ensure static PGO's optimization ability stays on more or less even footing with dynamic.
  • I had originally though the total number of unique table entries seen would play into the heuristics; it doesn't currently, but it would still perhaps be nice to know that number, instead of capping it at MAX_LIKELY_CLASSES.
  • For virtual calls there's typically a 3-stage lookup: (1) method table, (2) method table chunk; (3) method table slot. In some cases one can check for equality at a later stage and cut down on the number of comparisons, when the likely cases all share a chunk or method. That is, if they share a chunk, instead of needing two method table compares as in (mt == MT1) || (mt == MT2) one can do one chunk compare (mtc = mt[...]; mtc == MT12). Not clear if this is always better, but if there are two likely classes with similar likelihoods, perhaps it would be.
  • Vtable shape (say whether it is chunked) can vary by type and slot and may be something we rethink someday. So the strategy needs to adapt to the types in question.
  • The slot entry comparison may be tricky to engineer as the optimized callee entry point address may not be known. We should think carefully about how to handle this.
  • When looking at secondary entries I've often thought we'd "renormalize" the histogram. Say we have 50, 20, 10, (20) as the most likely probabilities + tail. Then after we've checked for the 50 case, we divide this out, and the conditional probabilities become 40, 20, (40). And if we then check for the 40 case, we end up with 33, (66). At each level we use something like our current threshold. But to make this work well we probably need more confidence in the tail probabilities.
  • Not sure how this should interact with chained GDV... thoughts?
  • Likewise, this has the potential to greatly increase code size.... how should we think about that?

@EgorBo
Copy link
Member Author

EgorBo commented Sep 12, 2021

Thanks @AndyAyersMS these are useful notes!

instead of capping it at MAX_LIKELY_CLASSES.

I'm going to change MAX_LIKELY_CLASSES to some big value like 128 just to still allow getLikelyUsers to use a stack-allocated array as an output

Not sure how this should interact with chained GDV... thoughts?
Likewise, this has the potential to greatly increase code size.... how should we think about that?

So for now this PR doesn't change anything other than prints more info in JitDump. I'd like to start from the case where the second most popular type uses the same method as the first one. From what I see there are ~2000 such virtual calls in AvaloniaUI app. I hope it won't cause any problem for things like "chained GDV" since I want it to emit something like:

single pMT check:

JTRUE
   EQ
     IND(LCL) ;; pMT
     CNS

two pMT checks:

JTRUE
  NE
    OR
      EQ
        IND(LCL)
        CNS ;; Type1
      EQ
        IND(LCL)
        CNS ;; Type2
    CNS 0

so it won't produce any new basic block and should be still thread-jump friendly. I hope it won't affect binary-size/memory footprint much or we can always make inliner a bit less aggressive if overall performance will be better 🙂

Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com>
@EgorBo
Copy link
Member Author

EgorBo commented Sep 12, 2021

@jakobbotsch btw, any idea what clrjitilc is?

@jakobbotsch
Copy link
Member

@jakobbotsch btw, any idea what clrjitilc is?

I think it's just the internal name used by crossgen2 for the JIT native library. It redirects to the proper dll here:

NativeLibrary.SetDllImportResolver(typeof(CorInfoImpl).Assembly, (libName, assembly, searchPath) =>
{
IntPtr libHandle = IntPtr.Zero;
if (libName == CorInfoImpl.JitLibrary)
{
if (!string.IsNullOrEmpty(jitPath))
{
libHandle = NativeLibrary.Load(jitPath);
}
else
{
libHandle = NativeLibrary.Load("clrjit_" + GetTargetSpec(target), assembly, searchPath);
}
}
if (libName == CorInfoImpl.JitSupportLibrary)
{
libHandle = NativeLibrary.Load("jitinterface_" + RuntimeInformation.ProcessArchitecture.ToString().ToLowerInvariant(), assembly, searchPath);
}
return libHandle;
});

So since you already renamed the export and pinvoke name I think it's fine.

Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com>
@EgorBo
Copy link
Member Author

EgorBo commented Sep 13, 2021

So since you already renamed the export and pinvoke name I think it's fine.

For some reason CI still screams it can't find that entrypoint in clrjitilc while locally it builds just fine :|

@jakobbotsch
Copy link
Member

jakobbotsch commented Sep 13, 2021

For some reason CI still screams it can't find that entrypoint in clrjitilc while locally it builds just fine :|

That's weird, I wonder if it's somehow using an outdated JIT dll (maybe from the sdk)? It looks like it's only happening on non-Windows builds. @davidwrighton do you have an idea what could be going on here?

src/coreclr/jit/likelyclass.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/likelyclass.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/likelyclass.cpp Outdated Show resolved Hide resolved
@davidwrighton
Copy link
Member

@EgorBo can you describe the format change to the PgoSchema data. In particular, as this is part of the static pgo format, we need to be certain that the new parser can read the old data.

Also, I think I found the spot where you failed to mark the symbol as a public symbol. See my comment above.

@EgorBo
Copy link
Member Author

EgorBo commented Sep 17, 2021

@EgorBo can you describe the format change to the PgoSchema data. In particular, as this is part of the static pgo format, we need to be certain that the new parser can read the old data.

Also, I think I found the spot where you failed to mark the symbol as a public symbol. See my comment above.

From my understanding, it doesn't change the format anyhow. ComputeLikelyClass will still save only a single (most popular) class under PgoInstrumentationKind.GetLikelyClass - I left it this way because storing multiple classes there will increase profile size (this PR doesn't use that data anyhow other than printing via JITDUMP yet).

src/coreclr/tools/Common/JitInterface/CorInfoImpl.cs Outdated Show resolved Hide resolved
src/coreclr/jit/importer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/likelyclass.cpp Show resolved Hide resolved
@EgorBo
Copy link
Member Author

EgorBo commented Sep 17, 2021

@AndyAyersMS Thanks, addressed.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. Just a few nits.

I don't suppose we can look at diffs easily. Make sure you spot-check a few cases by hand I guess.

src/coreclr/jit/importer.cpp Outdated Show resolved Hide resolved
@AndyAyersMS
Copy link
Member

I'm confused why you are seeing null handles.

The LikelyClassHistogramconstructor should be filtering them out, and we should never be creating a GetLikelyClass schema record with a null handle.

@EgorBo
Copy link
Member Author

EgorBo commented Sep 19, 2021

I'm confused why you are seeing null handles.

The LikelyClassHistogramconstructor should be filtering them out, and we should never be creating a GetLikelyClass schema record with a null handle.

Thanks! So if a histogram contains a few (or even a single) nulls - does it mean such a callsite is super rare and is not worth devirtualizing? (However, I guess its block will be cold and we'll ignore it anyway)

@AndyAyersMS
Copy link
Member

The runtime helpers for class profiles perform what's known as reservoir sampling. The data we track includes a count of the number of times the call site was hit.

The histogram table can have 0..8 non-null entries with current table size. They can only all be null if the counter is zero, meaning this call site was never hit at runtime (this is fairly common).

There may be "trailing" nulls if the call site is lightly hit (count 1...7).

If the call site is more frequently hit all entries should be non-null.

The same handle may appear in more than one slot in the table, which is how we deduce its relative likelihood.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still unsure on one point.

If the observations include collectible classes we put in placeholders (potentially up to 32 different values, though we currently just always use one). And we want the presence of those placeholders to show up in the estimated number of classes seen.

However, we filter the placeholders from the output array, so there's no way the count of elements in the output array can represent the estimated number of classes.

Am I missing something here...?

@EgorBo
Copy link
Member Author

EgorBo commented Sep 23, 2021

I'm still unsure on one point.

If the observations include collectible classes we put in placeholders (potentially up to 32 different values, though we currently just always use one). And we want the presence of those placeholders to show up in the estimated number of classes seen.

However, we filter the placeholders from the output array, so there's no way the count of elements in the output array can represent the estimated number of classes.

Am I missing something here...?

@AndyAyersMS Sorry for the delay,

so currently for numberOfClasses we return the number of only known handles but we do take them into the account when we calculate likelihood (it was like that even before my PR). I did a quick test with ALCs:
image

for this Test getLikelyClasses return 2 ([Class1, 50%], [Class2, 25%]) - so we took the dynamic one into the account here.

@EgorBo
Copy link
Member Author

EgorBo commented Sep 23, 2021

Failures aren't related (fail on other PRs too)

@EgorBo EgorBo merged commit 43740ce into dotnet:main Sep 24, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Nov 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants