Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respect JitEnableOptionalRelocs in TryReserveInitialMemory #107059

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Aug 27, 2024

Let's not try to allocate the loader heap near coreclr when JitEnableOptionalRelocs=0 is set since it seems to be pretty random whether we successfully do that or fallback to a random memory - that leads to inconsistent performance measurements. From what I see we are more often fail to do that anyway or arm64. It also not uncommon on Linux-x64 as well.

@EgorBo
Copy link
Member Author

EgorBo commented Aug 28, 2024

PTAL @jkotas @janvorli

@jkotas
Copy link
Member

jkotas commented Aug 28, 2024

If you want the JitEnableOptionalRelocs=0 switch to be fully reliable, I do not think that tweaking how memory gets allocated is the right strategy. I won't ever make it fully reliable.

Instead, should this switch be telling the JIT and VM to always go through jump stubs, and to disable other optimizations that depend on address proximity instead?

@EgorBo
Copy link
Member Author

EgorBo commented Aug 28, 2024

If you want the JitEnableOptionalRelocs=0 switch to be fully reliable, I do not think that tweaking how memory gets allocated is the right strategy. I won't ever make it fully reliable.

Do you mean it might be still reachable even with random address? (extremely unlikely with arm64 I suppose).

Instead, should this switch be telling the JIT and VM to always go through jump stubs, and to disable other optimizations that depend on address proximity instead?

including all managed-to-managed as well? won't it have a heavy impact compared to default mode?

@jkotas
Copy link
Member

jkotas commented Aug 28, 2024

including all managed-to-managed as well? won't it have a heavy impact compared to default mode?

All these are opportunistic optimizations that can be non-deterministic, The switch documentation says that setting it to 0 should disable all of them:

// *Some* relocs are just opportunistic optimizations and can be non-deterministic - it might produce
// noise for jit-diff like tools.
RETAIL_CONFIG_DWORD_INFO(UNSUPPORTED_JitEnableOptionalRelocs, W("JitEnableOptionalRelocs"), 1, "Allow optional relocs")

It sounds like that you would like the switch to disable some of these, but not all of them. How do we decide what the switch should disable and what it should not disable?

@EgorBo
Copy link
Member Author

EgorBo commented Aug 29, 2024

@EgorBot -arm64 --envvars DOTNET_JitEnableOptionalRelocs:0

using BenchmarkDotNet.Attributes;
using System.Runtime.CompilerServices;

public class Bench
{
    [Benchmark]
    public void Test1() => Test2();

    [MethodImpl(MethodImplOptions.NoInlining)] void Test2() => Test3();
    [MethodImpl(MethodImplOptions.NoInlining)] void Test3() => Test4();
    [MethodImpl(MethodImplOptions.NoInlining)] void Test4(){}
}

@EgorBo
Copy link
Member Author

EgorBo commented Aug 29, 2024

@EgorBot -arm64 -perf --envvars DOTNET_JitEnableOptionalRelocs:0

using BenchmarkDotNet.Attributes;
using System.Runtime.CompilerServices;

public class Bench
{
    [Benchmark]
    public void Test1() => Test2();

    [MethodImpl(MethodImplOptions.NoInlining)] void Test2() => Test3();
    [MethodImpl(MethodImplOptions.NoInlining)] void Test3() => Test4();
    [MethodImpl(MethodImplOptions.NoInlining)] void Test4(){}
}

@EgorBot
Copy link

EgorBot commented Aug 29, 2024

Benchmark results on Arm64
BenchmarkDotNet v0.14.0, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Unknown processor
  Job-MRFXNX : .NET 10.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
  Job-HSHRHX : .NET 10.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
EnvironmentVariables=DOTNET_JitEnableOptionalRelocs=0
Method Toolchain Mean Error Ratio
Test1 Main 2.382 ns 0.0011 ns 1.00
Test1 PR 2.394 ns 0.0018 ns 1.00

BDN_Artifacts.zip

@EgorBot
Copy link

EgorBot commented Aug 29, 2024

Benchmark results on Arm64
BenchmarkDotNet v0.14.0, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Unknown processor
  Job-HMNEZL : .NET 10.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
  Job-CJCSYY : .NET 10.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
EnvironmentVariables=DOTNET_JitEnableOptionalRelocs=0
Method Toolchain Mean Error Ratio
Test1 Main 2.385 ns 0.0017 ns 1.00
Test1 PR 2.394 ns 0.0023 ns 1.00

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

@EgorBo
Copy link
Member Author

EgorBo commented Aug 29, 2024

@EgorBot -arm64 -intel -perf --envvars DOTNET_JitEnableOptionalRelocs:0

using BenchmarkDotNet.Attributes;
using System.Runtime.CompilerServices;

public class Bench
{
    [Benchmark]
    public void ManagedToManaged()
    {
        DoWork();
        DoWork();
        DoWork();
        DoWork();
    }
    [MethodImpl(MethodImplOptions.NoInlining)] void DoWork() { }

    object a;
    object b;

    [Benchmark]
    public void WriteBarrier() => a = b;
}

@EgorBot
Copy link

EgorBot commented Aug 29, 2024

Benchmark results on Intel
BenchmarkDotNet v0.14.0, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Intel Xeon Platinum 8370C CPU 2.80GHz, 1 CPU, 16 logical and 8 physical cores
  Job-OHQWDF : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
EnvironmentVariables=DOTNET_JitEnableOptionalRelocs=0
Method Toolchain Mean Error Ratio
ManagedToManaged Main 3.710 ns 0.0003 ns 1.00
ManagedToManaged PR NA NA ?
WriteBarrier Main 1.147 ns 0.0002 ns 1.00
WriteBarrier PR NA NA ?
Benchmarks with issues:
Bench.ManagedToManaged: Job-SKXYNU(EnvironmentVariables=DOTNET_JitEnableOptionalRelocs=0, Toolchain=PR)
Bench.WriteBarrier: Job-SKXYNU(EnvironmentVariables=DOTNET_JitEnableOptionalRelocs=0, Toolchain=PR)

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

@EgorBo
Copy link
Member Author

EgorBo commented Aug 31, 2024

Ah, managed-to-managed doesn't go through a jump-stub, but rather starts doing:

 00007fa6`cf831e2d        mov      rax, 0x7FA6CFC34348      ; code for Proga:Test()
 00007fa6`cf831e37        call     [rax]Proga:Test()

if it can't do a reloc'd call (on x64)

The perf impact seems to be around noise in microbenchmarks.

@EgorBo
Copy link
Member Author

EgorBo commented Aug 31, 2024

@EgorBot -arm64 -perf

using BenchmarkDotNet.Attributes;
using System.Runtime.CompilerServices;

public class Bench
{
    [Benchmark]
    public void ManagedToManaged()
    {
        DoWork();
        DoWork();
        DoWork();
        DoWork();
    }
    [MethodImpl(MethodImplOptions.NoInlining)] void DoWork() { }

    object a;
    object b;

    [Benchmark]
    public void WriteBarrier() => a = b;
}

@EgorBot
Copy link

EgorBot commented Aug 31, 2024

Benchmark results on Arm64
BenchmarkDotNet v0.14.0, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Unknown processor
  Job-IIRFOU : .NET 10.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
  Job-AGLXAB : .NET 10.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
Method Toolchain Mean Error Ratio
ManagedToManaged Main 4.175 ns 0.0010 ns 1.00
ManagedToManaged PR 4.223 ns 0.0016 ns 1.01
WriteBarrier Main 3.294 ns 0.0010 ns 1.00
WriteBarrier PR 3.281 ns 0.0007 ns 1.00

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

@EgorBo EgorBo marked this pull request as draft September 4, 2024 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants