Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port IsRedundantMov to xarch #54075

Merged
merged 3 commits into from
Jun 13, 2021
Merged

Conversation

tannergooding
Copy link
Member

This ports the emitter::IsRedundantMov method that exists for Arm64 to also exist for x86/x64.

The PMI diff reports are below and this namely impacts two cases.

First, the following looks to occur in most IL_STUB_PInvoke methods:

        movzx    rax, al
-       movzx    rax, al

Second, this resolves #49535 by avoiding the shuffle back/forth:

        mov      rsi, rax
-       mov      rax, rsi

There are a couple TODO-CQ-XArch left in the comments as we should be able to utilize AreUpper32BitsZero and an equivalent method for SIMD instructions to elide a few additional variations.

jit-diff diff --diff --pmi --frameworks
Found 346 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 53960593
Total bytes of diff: 53959385
Total bytes of delta: -1208 (-0.00% of base)
    diff is an improvement.


Top file improvements (bytes):
        -298 : System.Private.CoreLib.dasm (-0.01% of base)
        -151 : FSharp.Core.dasm (-0.00% of base)
         -53 : Microsoft.CodeAnalysis.CSharp.dasm (-0.00% of base)
         -46 : System.DirectoryServices.dasm (-0.01% of base)
         -43 : System.Console.dasm (-0.08% of base)
         -39 : System.Security.Cryptography.X509Certificates.dasm (-0.02% of base)
         -30 : System.DirectoryServices.AccountManagement.dasm (-0.01% of base)
         -24 : System.Data.OleDb.dasm (-0.01% of base)
         -24 : System.Collections.dasm (-0.00% of base)
         -24 : System.Security.Cryptography.Cng.dasm (-0.01% of base)
         -24 : System.Diagnostics.Process.dasm (-0.03% of base)
         -21 : Microsoft.Win32.SystemEvents.dasm (-0.09% of base)
         -20 : System.Threading.Tasks.Dataflow.dasm (-0.00% of base)
         -20 : System.Data.Odbc.dasm (-0.01% of base)
         -17 : Microsoft.CodeAnalysis.dasm (-0.00% of base)
         -15 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (-0.00% of base)
         -15 : System.Security.AccessControl.dasm (-0.02% of base)
         -15 : System.Security.Cryptography.Algorithms.dasm (-0.00% of base)
         -15 : System.Drawing.Common.dasm (-0.00% of base)
         -15 : System.Security.Cryptography.Pkcs.dasm (-0.00% of base)

74 total files with Code Size differences (74 improved, 0 regressed), 198 unchanged.

Top method improvements (bytes):
         -25 (-1.69% of base) : FSharp.Core.dasm - FSharpOption`1:Equals(Object,IEqualityComparer):bool:this (8 methods)
         -24 (-1.67% of base) : FSharp.Core.dasm - FSharpList`1:Equals(Object,IEqualityComparer):bool:this (8 methods)
         -24 (-2.05% of base) : FSharp.Core.dasm - FSharpList`1:Equals(FSharpList`1):bool:this (8 methods)
         -24 (-1.10% of base) : System.Private.CoreLib.dasm - HashSet`1:IsProperSupersetOf(IEnumerable`1):bool:this (8 methods)
         -24 (-1.11% of base) : System.Private.CoreLib.dasm - HashSet`1:SetEquals(IEnumerable`1):bool:this (8 methods)
         -24 (-1.02% of base) : System.Collections.dasm - SortedSet`1:IsSubsetOf(IEnumerable`1):bool:this (8 methods)
         -21 (-0.64% of base) : FSharp.Core.dasm - FSharpChoice`2:Equals(Object,IEqualityComparer):bool:this (8 methods)
         -21 (-0.71% of base) : FSharp.Core.dasm - FSharpChoice`2:Equals(FSharpChoice`2):bool:this (8 methods)
         -21 (-3.14% of base) : FSharp.Core.dasm - FSharpOption`1:Equals(FSharpOption`1):bool:this (8 methods)
         -19 (-0.83% of base) : System.Private.CoreLib.dasm - Number:TryParseNumber(byref,long,int,byref,NumberFormatInfo):bool
         -17 (-0.21% of base) : System.Threading.Tasks.Dataflow.dasm - BatchBlockTargetCore:ConsumeReservedMessagesNonGreedy():this (8 methods)
         -11 (-1.39% of base) : Microsoft.CodeAnalysis.dasm - SmallDictionary`2:Insert(int,__Canon,Nullable`1,bool):this
          -9 (-1.53% of base) : System.Private.CoreLib.dasm - ILStubClass:IL_STUB_PInvoke(String):bool (3 methods)
          -9 (-1.47% of base) : System.Private.CoreLib.dasm - ILStubClass:IL_STUB_PInvoke(String,int):bool (3 methods)
          -9 (-1.70% of base) : System.Private.CoreLib.dasm - ILStubClass:IL_STUB_PInvoke(long,long,int):bool (3 methods)
          -9 (-1.76% of base) : System.Diagnostics.Process.dasm - ILStubClass:IL_STUB_PInvoke(long):bool (3 methods)
          -7 (-3.63% of base) : Microsoft.CodeAnalysis.CSharp.dasm - Binder:IsAnonymousTypesAllowed():bool:this
          -7 (-3.87% of base) : Microsoft.CodeAnalysis.CSharp.dasm - MethodTypeInferrer:DoesOutputTypeContain(BoundExpression,TypeSymbol,TypeParameterSymbol):bool
          -7 (-2.50% of base) : Microsoft.CSharp.dasm - TypeUtils:IsImplicitNumericConversion(Type,Type):bool
          -7 (-0.85% of base) : System.Net.Primitives.dasm - IPv6AddressHelper:IsValidStrict(long,int,byref):bool

Top method improvements (percentages):
          -6 (-5.66% of base) : System.Private.CoreLib.dasm - Interlocked:And(byref,long):long (2 methods)
          -6 (-5.66% of base) : System.Private.CoreLib.dasm - Interlocked:Or(byref,long):long (2 methods)
          -3 (-3.90% of base) : System.Private.CoreLib.dasm - UTF8Marshaler:ConvertToManaged(long):String
          -7 (-3.87% of base) : Microsoft.CodeAnalysis.CSharp.dasm - MethodTypeInferrer:DoesOutputTypeContain(BoundExpression,TypeSymbol,TypeParameterSymbol):bool
          -3 (-3.85% of base) : System.Private.CoreLib.dasm - DecCalc:Div96ByConst(byref,byref,int):bool
          -3 (-3.80% of base) : System.Private.Uri.dasm - UriParser:InternalValidate(Uri,byref):this
          -7 (-3.63% of base) : Microsoft.CodeAnalysis.CSharp.dasm - Binder:IsAnonymousTypesAllowed():bool:this
          -3 (-3.49% of base) : System.Runtime.Numerics.dasm - BigInteger:Equals(BigInteger):bool:this
          -3 (-3.23% of base) : System.DirectoryServices.Protocols.dasm - Utility:IsResultCode(int):bool
         -21 (-3.14% of base) : FSharp.Core.dasm - FSharpOption`1:Equals(FSharpOption`1):bool:this (8 methods)
          -3 (-2.73% of base) : System.Text.RegularExpressions.dasm - RegexNode:IsAtomicByParent():bool:this
          -4 (-2.53% of base) : System.Console.dasm - ILStubClass:IL_STUB_PInvoke(int):short
          -4 (-2.52% of base) : System.Data.Odbc.dasm - ILStubClass:IL_STUB_PInvoke(long):short
          -7 (-2.50% of base) : Microsoft.CSharp.dasm - TypeUtils:IsImplicitNumericConversion(Type,Type):bool
          -4 (-2.42% of base) : System.Data.Odbc.dasm - ILStubClass:IL_STUB_PInvoke(short,long):short
          -3 (-2.34% of base) : System.Private.CoreLib.dasm - IListWrapperEnumWrapper:MoveNext():bool:this
          -4 (-2.30% of base) : System.Data.Odbc.dasm - ILStubClass:IL_STUB_PInvoke(short,long,short):short
          -7 (-2.28% of base) : System.Net.Primitives.dasm - <System-Collections-Generic-IEnumerable<System-Net-Cookie>-GetEnumerator>d__32:MoveNext():bool:this
          -4 (-2.25% of base) : System.Data.Odbc.dasm - ILStubClass:IL_STUB_PInvoke(long,int,long,int):short
          -6 (-2.20% of base) : System.Reflection.MetadataLoadContext.dasm - Assignability:ProvablyAGcReferenceTypeHelper(Type,CoreTypes):bool

293 total methods with Code Size differences (293 improved, 0 regressed), 271640 unchanged.
jit-diff diff --diff --pmi --benchmarks
Found 83 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 506793
Total bytes of diff: 506793
Total bytes of delta: 0 (0.00% of base)


0 total files with Code Size differences (0 improved, 0 regressed), 82 unchanged.

0 total methods with Code Size differences (0 improved, 0 regressed), 1901 unchanged.
jit-diff diff --diff --pmi --tests
Found 4201 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 139783297
Total bytes of diff: 139780863
Total bytes of delta: -2434 (-0.00% of base)
    diff is an improvement.


Top file improvements (bytes):
        -114 : Interop\PInvoke\Array\MarshalArrayAsParam\AsDefault\AsDefaultTest\AsDefaultTest.dasm (-0.39% of base)
        -114 : Interop\PInvoke\Array\MarshalArrayAsParam\AsLPArray\AsLPArrayTest\AsLPArrayTest.dasm (-0.36% of base)
         -75 : Interop\StructMarshalling\PInvoke\MarshalStructAsLayoutSeq\MarshalStructAsLayoutSeq.dasm (-0.08% of base)
         -36 : Interop\StructMarshalling\ReversePInvoke\MarshalSeqStruct\DelegatePInvoke\DelegatePInvokeTest\DelegatePInvokeTest.dasm (-0.05% of base)
         -36 : Interop\PInvoke\Vector2_3_4\Vector2_3_4\Vector2_3_4.dasm (-0.35% of base)
         -12 : Interop\PInvoke\Decimal\DecimalTest\DecimalTest.dasm (-0.12% of base)
          -9 : Interop\LayoutClass\LayoutClassTest\LayoutClassTest.dasm (-0.12% of base)
          -9 : Interop\SuppressGCTransition\SuppressGCTransitionTest\SuppressGCTransitionTest.dasm (-0.13% of base)
          -9 : Interop\IJW\NativeVarargs\NativeVarargsTest\IjwNativeVarargs.dasm (-0.02% of base)
          -6 : Interop\ICustomMarshaler\Coreclr.TestWrapper.dasm (-0.06% of base)
          -6 : Interop\DllImportAttribute\Coreclr.TestWrapper.dasm (-0.06% of base)
          -6 : JIT\Math\Coreclr.TestWrapper.dasm (-0.06% of base)
          -6 : Interop\NativeLibrary\Coreclr.TestWrapper.dasm (-0.06% of base)
          -6 : JIT\Intrinsics\Coreclr.TestWrapper.dasm (-0.06% of base)
          -6 : JIT\Methodical\fp\exgen\10w5d_cs_do\10w5d_cs_do.dasm (-0.00% of base)
          -6 : JIT\Methodical\fp\exgen\10w5d_cs_ro\10w5d_cs_ro.dasm (-0.00% of base)
          -6 : JIT\SIMD\Coreclr.TestWrapper.dasm (-0.06% of base)
          -6 : JIT\CheckProjects\Coreclr.TestWrapper.dasm (-0.06% of base)
          -6 : JIT\RyuJIT\Coreclr.TestWrapper.dasm (-0.06% of base)
          -6 : Interop\UnmanagedCallConv\UnmanagedCallConvTest\PInvokesIL.dasm (-0.51% of base)

616 total files with Code Size differences (616 improved, 0 regressed), 2958 unchanged.

Top method improvements (bytes):
        -108 (-0.67% of base) : Interop\PInvoke\Array\MarshalArrayAsParam\AsDefault\AsDefaultTest\AsDefaultTest.dasm - ILStubClass:IL_STUB_PInvoke(ref,int):bool (55 methods)
        -108 (-0.65% of base) : Interop\PInvoke\Array\MarshalArrayAsParam\AsLPArray\AsLPArrayTest\AsLPArrayTest.dasm - ILStubClass:IL_STUB_PInvoke(ref,int):bool (56 methods)
         -36 (-0.17% of base) : Interop\StructMarshalling\PInvoke\MarshalStructAsLayoutSeq\MarshalStructAsLayoutSeq.dasm - ILStubClass:IL_STUB_PInvoke(byref):bool (52 methods)
         -18 (-0.15% of base) : Interop\StructMarshalling\ReversePInvoke\MarshalSeqStruct\DelegatePInvoke\DelegatePInvokeTest\DelegatePInvokeTest.dasm - ILStubClass:IL_STUB_PInvoke(byref):bool (28 methods)
         -12 (-1.69% of base) : Interop\StructMarshalling\PInvoke\MarshalStructAsLayoutSeq\MarshalStructAsLayoutSeq.dasm - ILStubClass:IL_STUB_PInvoke(IncludeOuterIntegerStructSequential):bool (4 methods)
         -12 (-1.60% of base) : Interop\StructMarshalling\PInvoke\MarshalStructAsLayoutSeq\MarshalStructAsLayoutSeq.dasm - ILStubClass:IL_STUB_PInvoke(S11):bool (4 methods)
         -12 (-1.12% of base) : Interop\StructMarshalling\PInvoke\MarshalStructAsLayoutSeq\MarshalStructAsLayoutSeq.dasm - ILStubClass:IL_STUB_PInvoke(NumberSequential):bool (4 methods)
          -9 (-1.60% of base) : Interop\LayoutClass\LayoutClassTest\LayoutClassTest.dasm - ILStubClass:IL_STUB_PInvoke(SealedBlittable):bool (3 methods)
          -9 (-1.54% of base) : Interop\PInvoke\Decimal\DecimalTest\DecimalTest.dasm - ILStubClass:IL_STUB_PInvoke(Decimal,int):bool (3 methods)
          -6 (-1.80% of base) : Interop\UnmanagedCallConv\UnmanagedCallConvTest\PInvokesIL.dasm - ILStubClass:IL_STUB_PInvoke(int,long):bool (2 methods)
          -6 (-1.69% of base) : Interop\StructMarshalling\ReversePInvoke\MarshalSeqStruct\DelegatePInvoke\DelegatePInvokeTest\DelegatePInvokeTest.dasm - ILStubClass:IL_STUB_PInvoke(IncludeOuterIntergerStructSequential):bool (2 methods)
          -6 (-1.60% of base) : Interop\StructMarshalling\ReversePInvoke\MarshalSeqStruct\DelegatePInvoke\DelegatePInvokeTest\DelegatePInvokeTest.dasm - ILStubClass:IL_STUB_PInvoke(S11):bool (2 methods)
          -6 (-1.12% of base) : Interop\StructMarshalling\ReversePInvoke\MarshalSeqStruct\DelegatePInvoke\DelegatePInvokeTest\DelegatePInvokeTest.dasm - ILStubClass:IL_STUB_PInvoke(NumberSequential):bool (2 methods)
          -6 (-1.80% of base) : Interop\UnmanagedCallConv\PInvokesIL\PInvokesIL.dasm - ILStubClass:IL_STUB_PInvoke(int,long):bool (2 methods)
          -6 (-1.33% of base) : Interop\PInvoke\Vector2_3_4\Vector2_3_4\Vector2_3_4.dasm - ILStubClass:IL_STUB_PInvoke(byref,float,float,float,float):bool (2 methods)
          -6 (-1.21% of base) : Interop\PInvoke\Vector2_3_4\Vector2_3_4\Vector2_3_4.dasm - ILStubClass:IL_STUB_PInvoke(byref,float,float,float,float,float,float):bool (2 methods)
          -6 (-1.12% of base) : Interop\PInvoke\Vector2_3_4\Vector2_3_4\Vector2_3_4.dasm - ILStubClass:IL_STUB_PInvoke(byref,float,float,float,float,float,float,float,float):bool (2 methods)
          -6 (-1.30% of base) : Interop\SuppressGCTransition\SuppressGCTransitionTest\SuppressGCTransitionTest.dasm - ILStubClass:IL_STUB_PInvoke(long,long):bool (4 methods)
          -6 (-1.29% of base) : Interop\UnmanagedCallConv\UnmanagedCallConvTest\UnmanagedCallConvTest.dasm - ILStubClass:IL_STUB_PInvoke(int,long):bool (4 methods)
          -6 (-1.61% of base) : Interop\PInvoke\Array\MarshalArrayAsParam\AsDefault\AsDefaultTest\AsDefaultTest.dasm - ILStubClass:IL_STUB_PInvoke(ref):bool (2 methods)

Top method improvements (percentages):
          -3 (-10.00% of base) : JIT\CodeGenBringUpTests\DivConst_do\DivConst_do.dasm - DivConst:I8_Div_3(long):long
          -3 (-10.00% of base) : JIT\CodeGenBringUpTests\DivConst_ro\DivConst_ro.dasm - DivConst:I8_Div_3(long):long
          -3 (-1.85% of base) : Interop\ICustomMarshaler\Coreclr.TestWrapper.dasm - ILStubClass:IL_STUB_PInvoke(long):bool
          -3 (-1.85% of base) : Interop\DllImportAttribute\Coreclr.TestWrapper.dasm - ILStubClass:IL_STUB_PInvoke(long):bool
          -3 (-1.85% of base) : JIT\Math\Coreclr.TestWrapper.dasm - ILStubClass:IL_STUB_PInvoke(long):bool
          -3 (-1.85% of base) : Interop\NativeLibrary\Coreclr.TestWrapper.dasm - ILStubClass:IL_STUB_PInvoke(long):bool
          -3 (-1.85% of base) : JIT\Intrinsics\Coreclr.TestWrapper.dasm - ILStubClass:IL_STUB_PInvoke(long):bool
          -3 (-1.85% of base) : JIT\SIMD\Coreclr.TestWrapper.dasm - ILStubClass:IL_STUB_PInvoke(long):bool
          -3 (-1.85% of base) : JIT\CheckProjects\Coreclr.TestWrapper.dasm - ILStubClass:IL_STUB_PInvoke(long):bool
          -3 (-1.85% of base) : JIT\RyuJIT\Coreclr.TestWrapper.dasm - ILStubClass:IL_STUB_PInvoke(long):bool
          -3 (-1.85% of base) : Interop\UnmanagedCallConv\Coreclr.TestWrapper.dasm - ILStubClass:IL_STUB_PInvoke(long):bool
          -3 (-1.85% of base) : Interop\WinRT\Coreclr.TestWrapper.dasm - ILStubClass:IL_STUB_PInvoke(long):bool
          -3 (-1.85% of base) : JIT\Stress\Coreclr.TestWrapper.dasm - ILStubClass:IL_STUB_PInvoke(long):bool
          -3 (-1.85% of base) : Interop\ICastable\Coreclr.TestWrapper.dasm - ILStubClass:IL_STUB_PInvoke(long):bool
          -3 (-1.85% of base) : Interop\IJW\Coreclr.TestWrapper.dasm - ILStubClass:IL_STUB_PInvoke(long):bool
          -3 (-1.85% of base) : JIT\IL_Conformance\Coreclr.TestWrapper.dasm - ILStubClass:IL_STUB_PInvoke(long):bool
          -3 (-1.85% of base) : Interop\COM\Coreclr.TestWrapper.dasm - ILStubClass:IL_STUB_PInvoke(long):bool
          -3 (-1.85% of base) : JIT\Methodical\Coreclr.TestWrapper.dasm - ILStubClass:IL_STUB_PInvoke(long):bool
          -3 (-1.85% of base) : Interop\ExecInDefAppDom\Coreclr.TestWrapper.dasm - ILStubClass:IL_STUB_PInvoke(long):bool
          -3 (-1.85% of base) : JIT\Directed\Coreclr.TestWrapper.dasm - ILStubClass:IL_STUB_PInvoke(long):bool

700 total methods with Code Size differences (700 improved, 0 regressed), 494153 unchanged.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 11, 2021
@tannergooding
Copy link
Member Author

CC. @AndyAyersMS

@AndyAyersMS
Copy link
Member

Linking to my earlier attempt, which I abandoned once we'd improved LSRA preferencing: dotnet/coreclr#21959 because

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left some suggestions for comments / naming.

// size - Operand size of current instruction
// dst - The current destination
// src - The current source
// canSkip - The move can be skipped as it doesn't represent special semantics
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider renaming this to better convey what it means, something like canIgnoreSideEffects?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've renamed this in IsRedundantMov but not everywhere else (as /* canSkip */ value is used for almost every call to emitIns_Mov).

@@ -4253,7 +4253,6 @@ bool emitter::IsMovInstruction(instruction ins)
case INS_movd:
case INS_movdqa:
case INS_movdqu:
case INS_movsd:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're removing movsd perhaps leave a comment describing why other "move-like" instructions aren't included here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also commenting nit: can you add a return value comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a return value and remarks comment covering why this doesn't cover all "move like" instructions:

//
// Return Value:
//    true if the instruction is a qualifying move instruction; otherwise, false
//
// Remarks:
//    This methods covers most kinds of two operand move instructions that copy a
//    value between two registers. It does not cover all move-like instructions
//    and so doesn't currently cover things like movsb/movsw/movsd/movsq or cmovcc
//    and doesn't currently cover cases where a value is read/written from memory.
//
//    The reason it doesn't cover all instructions was namely to limit the scope
//    of the initial change to that which was impactful to move elision so that
//    it could be centrally managed and optimized. It may be beneficial to support
//    the other move instructions in the future but that may require more extensive
//    changes to ensure relevant codegen/emit paths flow and check things correctly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TL;DR is just that it was only covering the move instructions that were impactful to move elision initially. Other move instructions have side effects (like being non-temporal or dealing with memory) and so weren't necessary to cover.

@tannergooding tannergooding merged commit feae079 into dotnet:main Jun 13, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Jul 13, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Redundant mov instructions generated
2 participants