Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Arm64 "constructed" constants in SuperPMI asm diffs #76616

Merged

Conversation

BruceForstall
Copy link
Member

@BruceForstall BruceForstall commented Oct 4, 2022

SuperPMI asm diffs tries to ignore constants that can change between multiple replays, such as addresses that the replay engine must generate and not simply hand back from the collected data.

Often, addresses have associated relocations generated during replay. SuperPMI can use these relocations to adjust the constants to allow two replays to match. However, there are cases on Arm64 where an address both doesn't report a relocation and is "constructed" using multiple mov/movk instructions.

One case is the allocPgoInstrumentationBySchema() API which returns a pointer to a PGO data buffer. An address within this buffer is constructed via a sequence such as:

mov     x0, #63408
movk    x0, #23602, lsl #16
movk    x0, #606, lsl #32

When SuperPMI replays this API, it constructs a new buffer and returns that pointer, which is used to construct various actual addresses that are generated as "constructed" constants, shown above.

This change "de-constructs" the constants and looks them up in the replay address map. If base and diff match the mapped constants, there is no asm diff.

Fixes #76347

SuperPMI asm diffs tries to ignore constants that can change between
multiple replays, such as addresses that the replay engine must generate
and not simply hand back from the collected data.

Often, addresses have associated relocations generated during replay.
SuperPMI can use these relocations to adjust the constants to allow
two replays to match. However, there are cases on Arm64 where an address
both doesn't report a relocation and is "constructed" using multiple
`mov`/`movk` instructions.

One case is the `allocPgoInstrumentationBySchema()`
API which returns a pointer to a PGO data buffer. An address within this
buffer is constructed via a sequence such as:
```
mov     x0, dotnet#63408
movk    x0, dotnet#23602, lsl dotnet#16
movk    x0, dotnet#606, lsl dotnet#32
```

When SuperPMI replays this API, it constructs a new buffer and returns that
pointer, which is used to construct various actual addresses that are
generated as "constructed" constants, shown above.

This change "de-constructs" the constants and looks them up in the replay
address map. If base and diff match the mapped constants, there is no asm diff.
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 4, 2022
@ghost ghost assigned BruceForstall Oct 4, 2022
@ghost
Copy link

ghost commented Oct 4, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

SuperPMI asm diffs tries to ignore constants that can change between multiple replays, such as addresses that the replay engine must generate and not simply hand back from the collected data.

Often, addresses have associated relocations generated during replay. SuperPMI can use these relocations to adjust the constants to allow two replays to match. However, there are cases on Arm64 where an address both doesn't report a relocation and is "constructed" using multiple mov/movk instructions.

One case is the allocPgoInstrumentationBySchema() API which returns a pointer to a PGO data buffer. An address within this buffer is constructed via a sequence such as:

mov     x0, #63408
movk    x0, #23602, lsl #16
movk    x0, #606, lsl #32

When SuperPMI replays this API, it constructs a new buffer and returns that pointer, which is used to construct various actual addresses that are generated as "constructed" constants, shown above.

This change "de-constructs" the constants and looks them up in the replay address map. If base and diff match the mapped constants, there is no asm diff.

Author: BruceForstall
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@BruceForstall
Copy link
Member Author

/azp run runtime-coreclr superpmi-asmdiffs-checked-release

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@BruceForstall
Copy link
Member Author

@AndyAyersMS @jakobbotsch @dotnet/jit-contrib PTAL

@jakobbotsch
Copy link
Member

Nice!
Is this orthogonal with #74718 (comment)?

@BruceForstall
Copy link
Member Author

Is this orthogonal with #74718 (comment)?

Yes, I think so. (Thanks; I hadn't seen that issue.)

Note that MethodContext::repAllocPgoInstrumentationBySchema() has the comment:

   // Todo, perhaps: record the buffer as a compile result instead, and defer copying until
   // jit completion so we can snapshot the offsets the jit writes.

I don't think we fully support 64-bit replay on 32-bit host, but this
fix at least makes it possible for this case.
@BruceForstall
Copy link
Member Author

/azp run runtime-coreclr superpmi-asmdiffs-checked-release

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

addr1 += (DWORDLONG)con4_1 << 48;
addr2 += (DWORDLONG)con4_2 << 48;
}
}
Copy link
Member

@jakobbotsch jakobbotsch Oct 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand right that this only works if both sides end up with the same number of instructions? For example, it is conceivable that one side may omit a movk because that section of the constant is all zeros.

Ultimately, I feel like we need to redesign some of the JIT-EE parts to make this robust. E.g. @EgorBo ended up needing to redesign the JIT-EE interface in #76112 in a way that ends up fixing #53773 for good, by splitting the addresses that JIT embeds in codegen from the notion of where the values are. This also helps support things in R2R and NAOT.

Can the same sort of fix not be applied to PGO? I already see that allocPgoInstrumentationBySchema returns a profileMemory pointer. Is anything stopping us from returning the same pointer to both JITs? From my cursory glance, it seems like it would work and it would make things completely deterministic. The pointer seems to only be used for codegen purposes, not accessed by the JIT.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right: it's checking if the instruction stream is exactly the same modulo the constants, you're right that one movk could be omitted if it is zero, or perhaps (unlikely) one could have an address in the >48 bit range and one does not.

Is anything stopping us from returning the same pointer to both JITs?

The fact that two compilations are happening is unknown at this point in the code. In fact, we currently don't even remember we allocated the buffer (except to put it on a list to free).

We would need to record it as a map from ftnHnd => pointer. There's nothing preventing the JIT from calling the allocPgoInstrumentationBySchema (or similar API) multiple times and getting multiple different results.

I guess if the MethodContext remembers if it ever returned a pointer for a ftnHnd, then it can look up in the table and return the same pointer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that two compilations are happening is unknown at this point in the code. In fact, we currently don't even remember we allocated the buffer (except to put it on a list to free).

We would need to record it as a map from ftnHnd => pointer. There's nothing preventing the JIT from calling the allocPgoInstrumentationBySchema (or similar API) multiple times and getting multiple different results.

I guess if the MethodContext remembers if it ever returned a pointer for a ftnHnd, then it can look up in the table and return the same pointer.

That's true, it would need an SPMI update (and by extension, JIT-EE GUID update) to save profileMemory during recording. I think then keying it by ftnHnd would make sense, it is the relationship in the VM today because the VM needs to be able to hand the same pointer back for multiple versions of the same method.

Another benefit is that there are other things that can affect the codegen. Constant CSE could potentially affect the codegen depending on the addresses too. So it would be best if we get the same exact pointer that was returned at runtime to make sure we are seeing the exact same codegen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I wrote:

I guess if the MethodContext remembers if it ever returned a pointer for a ftnHnd, then it can look up in the table and return the same pointer.

I was thinking about during replay.

So it would be best if we get the same exact pointer that was returned at runtime to make sure we are seeing the exact same codegen.

By "runtime" do you mean during collection time? I don't think that's possible in general because the JIT might want to write to or in some cases read from that address, and we can't guarantee the same pointer at replay as at collection time.

Copy link
Member

@jakobbotsch jakobbotsch Oct 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "runtime" do you mean during collection time? I don't think that's possible in general because the JIT might want to write to or in some cases read from that address, and we can't guarantee the same pointer at replay as at collection time.

Yes, I mean the same pointer during replay as during collection. This is what I was saying previously. I think we should be trying to design the JIT-EE interface so that the JIT does not need to read directly from handles it is using for codegen purposes. This is necessary not just for SPMI, but usually also for R2R/NAOT where the handles are not actually memory addresses. @EgorBo is moving static field addresses to this model in #76112.

I think it is already the case with allocPgoInstrumentationBySchema today. From a cursory glance the JIT does not read or write to the returned profileMemory pointer. It is purely used for codegen purposes.

If the JIT wants to start reading or writing this pointer I think it should be accomplished with changes to the JIT-EE interface. That way we could also support instrumentation during prejit, for example. In those cases pointers like these are presumably going to be some opaque handle that cannot just be dereferenced.

Allow JIT1 and JIT2 to have a different sequence of
mov/movk[/movk[/movk]] that map to the same address in the
address map. That is, the replay constant might require a different
set of instructions (e.g., if a `movk` is missing because its constant
is zero).
@BruceForstall
Copy link
Member Author

/azp run runtime-coreclr superpmi-asmdiffs-checked-release

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@BruceForstall
Copy link
Member Author

@jakobbotsch I've fixed the case where JIT1 and JIT2 could have a different sequence of mov/movk instructions.

I don't want to spend time at this point on any broader rearchitecting, such as adding a handle->pointer map for repAllocPgoInstrumentationBySchema; I want to unblock the Checked/Release asmdiffs pipeline and move on.

@jakobbotsch
Copy link
Member

@jakobbotsch I've fixed the case where JIT1 and JIT2 could have a different sequence of mov/movk instructions.

Does that work? Isn't the differ going to complain about the different number of instructions no matter what?

I don't want to spend time at this point on any broader rearchitecting, such as adding a handle->pointer map for repAllocPgoInstrumentationBySchema; I want to unblock the Checked/Release asmdiffs pipeline and move on.

Fair enough, but it may just become more flakey. I don't mind doing the work to fix this and #74718 together.
We can take the fix in the meantime, but I would rather substitute it for a more complete fix at a later date.

@BruceForstall
Copy link
Member Author

Does that work? Isn't the differ going to complain about the different number of instructions no matter what?

Ugh, yes :-( Maybe I'm too tired to code today :)

@BruceForstall
Copy link
Member Author

Does that work? Isn't the differ going to complain about the different number of instructions no matter what?

fwiw, we could engineer the code to skip the number of instructions we choose, if this is a problem.

@BruceForstall BruceForstall merged commit b303ffe into dotnet:main Oct 5, 2022
@BruceForstall BruceForstall deleted the SupportArm64InlineConstants branch October 5, 2022 14:49
@ghost ghost locked as resolved and limited conversation to collaborators Nov 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failures in checked/release asm diffs
2 participants