Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable matching constants for vectors that needs upper half to be save/restore #74110

Merged
merged 4 commits into from
Aug 18, 2022

Conversation

kunalspathak
Copy link
Member

@kunalspathak kunalspathak commented Aug 17, 2022

This reverts commit 24f5de4.

#70171 reuses Vector constants in register allocation but LSRA doesn't handle it very well. It doesn't have a semantics of looking into constant registers and save/restore the upper-half of them during the call. Because of this, the constant register that we want to reuse after the call gets garbage data in the upper-half and leads to silent functional differences.

There were several alternatives tried:

  • During buildUpperVectorSave() also check if there are any vector registers holding constants and add RefPosition to save/restore, but we don't have that information until the register allocation.
  • Piggyback on one of the RefTypeKill refposition and clear the constant vector registers mask during allocation, but we don't have the treeNode information for RefTypeKill to accurately do this only for Calls and not for anything else.

For now, I am reverting the change and handle it in better way as part of #70182.

For now, I am disabling the IsMatchingConstant() optimization for registers whose upper half need save/restore.

Fixes: #70642

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Aug 17, 2022
@ghost ghost assigned kunalspathak Aug 17, 2022
@ghost
Copy link

ghost commented Aug 17, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

This reverts commit 24f5de4.

#70171 reuses Vector constants in register allocation but LSRA doesn't handle it very well. It doesn't have a semantics of looking into constant registers and save/restore the upper-half of them during the call. Because of this, the constant register that we want to reuse after the call gets garbage data in the upper-half and leads to silent functional differences.

There were several alternatives tried:

  • During buildUpperVectorSave() also check if there are any vector registers holding constants and add RefPosition to save/restore, but we don't have that information until the register allocation.
  • Piggyback on one of the RefTypeKill refposition and clear the constant vector registers mask during allocation, but we don't have the treeNode information for RefTypeKill to accurately do this only for Calls and not for anything else.

For now, I am reverting the change and handle it in better way as part of #70182.

Fixes: #70642

Author: kunalspathak
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@kunalspathak
Copy link
Member Author

@tannergooding @BruceForstall @dotnet/jit-contrib

@BruceForstall BruceForstall added this to the 7.0.0 milestone Aug 17, 2022
Comment on lines 2722 to 2725
case GT_CNS_VEC:
{
return GenTreeVecCon::Equals(refPosition->treeNode->AsVecCon(), otherTreeNode->AsVecCon());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of reverting, would it be feasible for this to be the following?

return (otherTreeNode->gtType != TYP_SIMD32) && GenTreeVecCon::Equals(refPosition->treeNode->AsVecCon(), otherTreeNode->AsVecCon());

Asking as the original change had some measured perf wins and since it looks like the bug only impacts TYP_SIMD32 due to the need to save/restore the upper halves (meaning I think it shouldn't impact TYP_SIMD8/12/16).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to scope it, we only need to do it for platforms where FEATURE_PARTIAL_SIMD_CALLEE_SAVE is defined (win-amd64, arm64, LoongArch64). Presumably there is no issue on x86?

varTypeNeedsPartialCalleeSave would need to be used.

Copy link
Member

@BruceForstall BruceForstall Aug 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I still think it's probably "too late" to construct a fix to ensure correctness, on all platforms, when it's trivial to revert the change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

varTypeNeedsPartialCalleeSave check seems reasonable to include, but I agree with @BruceForstall that the simplest thing would be to revert the change at this point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍, just thought I'd throw it out as a suggestion given this will have negative impact on Arm64.

I expect we'll see this one dotnet/perf-autofiling-issues#5976 and a few of the other code paths that use vector constants regress a bit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the spmi-diffs and there are few in benchmarks. I tried the change to just use varTypeNeedsPartialCalleeSave and didn't see as many spmi-diffs in benchmarks, asp locally. I will let CI run and check other diffs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no change in spmi diff on Arm64 if we just disable isMatchingConstant for varTypeNeedsPartialCalleeSave. On x64, there is slight difference (earlier in benchmark collection, 12 methods regressed and now 5).

it looks like the bug only impacts TYP_SIMD32 due to the need to save/restore the upper halves (meaning I think it shouldn't impact TYP_SIMD8/12/16).

@tannergooding , even Arm64 needs save/restore upper half as per this comment, so it could be problematic there too, right?

static bool varTypeNeedsPartialCalleeSave(var_types type)
{
assert(type != TYP_STRUCT);
// ARM64 ABI FP Callee save registers only require Callee to save lower 8 Bytes
// For SIMD types longer than 8 bytes Caller is responsible for saving and restoring Upper bytes.
return ((type == TYP_SIMD16) || (type == TYP_SIMD12));
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so it is (https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#simd-and-floating-point-registers):

Registers v8-v15 must be preserved by a callee across subroutine calls; the remaining registers (v0-v7, v16-v31) do not need to be preserved (or should be preserved by the caller). Additionally, only the bottom 64 bits of each value stored in v8-v15 need to be preserved [7]; it is the responsibility of the caller to preserve larger values.

I thought the entire register needed to be saved, but apparently not, that's unfortunate.

There is no change in spmi diff on Arm64 if we just disable isMatchingConstant for varTypeNeedsPartialCalleeSave. On x64, there is slight difference (earlier in benchmark collection, 12 methods regressed and now 5).

What was the regression on Arm64 if we revert the entire PR?

Getting the regression down to just 5 on x64 sounds 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the regression on Arm64 if we revert the entire PR?

There is no change between total revert and limiting it to just varTypeNeedsPartialCalleeSave on Arm64.

https://dev.azure.com/dnceng/public/_build/results?buildId=1951028&view=ms.vss-build-web.run-extensions-tab

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JulieLeeMSFT
Copy link
Member

CC @jeffschwMSFT, this is the bug fix for #70642, and we are reverting it.

@jeffschwMSFT
Copy link
Member

If wanted in 7, we need to backport to 7 RC1

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Please make sure you change the title and text of this PR to reflect the current change (it's no longer "revert". Same for the commit message when you merge.

@kunalspathak kunalspathak changed the title Revert #70171 "Ensure that GT_CNS_VEC is handled in ..." Disable matching constants for vectors that needs upper half to be save/restore Aug 18, 2022
@kunalspathak kunalspathak merged commit 64d27bb into dotnet:main Aug 18, 2022
@kunalspathak
Copy link
Member Author

/backport to release/7.0-rc1

@github-actions
Copy link
Contributor

Started backporting to release/7.0-rc1: https://github.com/dotnet/runtime/actions/runs/2885813879

@kunalspathak
Copy link
Member Author

@kunalspathak kunalspathak deleted the revert-vec-con branch August 18, 2022 23:22
@kunalspathak
Copy link
Member Author

@ghost ghost locked as resolved and limited conversation to collaborators Sep 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Test failure JIT\\HardwareIntrinsics\\General\\Vector256\\Vector256_ro\\Vector256_ro.cmd
5 participants