Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suboptimal code generated for even a trivial ValueConverter #85570

Closed
hez2010 opened this issue Apr 30, 2023 · 9 comments
Closed

Suboptimal code generated for even a trivial ValueConverter #85570

hez2010 opened this issue Apr 30, 2023 · 9 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI Priority:2 Work that is important, but not critical for the release tenet-performance Performance related issue
Milestone

Comments

@hez2010
Copy link
Contributor

hez2010 commented Apr 30, 2023

Description

I'm seeing suboptimal code generated for ValueConverters even when what the converter does is trivial.

For example,

class Program
{
    private static readonly Foo foo = new();
    private static readonly CultureInfo culture = CultureInfo.CurrentCulture;

    public int Test()
    {
        var x = 3;
        var y = (int)foo.Convert(x, typeof(int), null, culture);
        return y;
    }
}

interface IValueConverter
{
    object Convert(object value, Type targetType, object? parameter, CultureInfo? culture);
}

sealed class Foo : IValueConverter
{
    public object Convert(object value, Type targetType, object? parameter, CultureInfo? culture)
    {
        return (int)value + 1;
    }
}

The codegen:

; Assembly listing for method Program:Test():int:this
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data

G_M000_IG01:                ;; offset=0000H
       push     rdi
       push     rsi
       sub      rsp, 40
 
G_M000_IG02:                ;; offset=0006H
       mov      rsi, 0x7FF7BD400858
       mov      rcx, rsi
       call     CORINFO_HELP_NEWSFAST
       mov      rdi, rax
       test     byte  ptr [(reloc 0x7ff7bd6f9344)], 1
       je       SHORT G_M000_IG05
 
G_M000_IG03:                ;; offset=0024H
       mov      rcx, 0x23591401DA8
       mov      rcx, gword ptr [rcx]
       mov      dword ptr [rdi+08H], 3
       cmp      byte  ptr [rcx], cl
       mov      rcx, rsi
       call     CORINFO_HELP_NEWSFAST
       mov      ecx, dword ptr [rdi+08H]
       inc      ecx
       mov      dword ptr [rax+08H], ecx
       mov      eax, dword ptr [rax+08H]
 
G_M000_IG04:                ;; offset=004DH
       add      rsp, 40
       pop      rsi
       pop      rdi
       ret      
 
G_M000_IG05:                ;; offset=0054H
       mov      rcx, 0x7FF7BD6F9310
       mov      edx, 4
       call     CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
       jmp      SHORT G_M000_IG03
 
; Total bytes of code 106

This is important because ValueConverters are heavily used in Desktop apps that use XAML (WPF, UWP, MAUI and etc.), and we want to minimize the overhead caused by ValueConverters.

Expected codegen:

mov eax, 4
ret

Configuration

.NET 8 preview 3

@hez2010 hez2010 added the tenet-performance Performance related issue label Apr 30, 2023
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 30, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Apr 30, 2023
@ghost
Copy link

ghost commented Apr 30, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

I'm seeing suboptimal code generated for ValueConverters even when the what the converter does is trivial.

For example,

class Program
{
    private static readonly Foo foo = new();
    private static readonly CultureInfo culture = CultureInfo.CurrentCulture;

    public int Test()
    {
        var x = 3;
        var y = (int)foo.Convert(x, typeof(int), null, culture);
        return y;
    }
}

interface IValueConverter
{
    object Convert(object value, Type targetType, object? parameter, CultureInfo? culture);
}

sealed class Foo : IValueConverter
{
    public object Convert(object value, Type targetType, object? parameter, CultureInfo? culture)
    {
        return (int)value + 1;
    }
}

The codegen:

; Assembly listing for method Program:Test():int:this
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data

G_M000_IG01:                ;; offset=0000H
       push     rdi
       push     rsi
       sub      rsp, 40
 
G_M000_IG02:                ;; offset=0006H
       mov      rsi, 0x7FF7BD400858
       mov      rcx, rsi
       call     CORINFO_HELP_NEWSFAST
       mov      rdi, rax
       test     byte  ptr [(reloc 0x7ff7bd6f9344)], 1
       je       SHORT G_M000_IG05
 
G_M000_IG03:                ;; offset=0024H
       mov      rcx, 0x23591401DA8
       mov      rcx, gword ptr [rcx]
       mov      dword ptr [rdi+08H], 3
       cmp      byte  ptr [rcx], cl
       mov      rcx, rsi
       call     CORINFO_HELP_NEWSFAST
       mov      ecx, dword ptr [rdi+08H]
       inc      ecx
       mov      dword ptr [rax+08H], ecx
       mov      eax, dword ptr [rax+08H]
 
G_M000_IG04:                ;; offset=004DH
       add      rsp, 40
       pop      rsi
       pop      rdi
       ret      
 
G_M000_IG05:                ;; offset=0054H
       mov      rcx, 0x7FF7BD6F9310
       mov      edx, 4
       call     CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
       jmp      SHORT G_M000_IG03
 
; Total bytes of code 106

This is important because ValueConverters are heavily used in Desktop apps that uses XAML (WPF, UWP, MAUI and etc.), and we want to minimize the overhead caused by ValueConverters.

Expected codegen:

mov eax, 4
ret

Configuration

.NET 8 preview 3

Author: hez2010
Assignees: -
Labels:

tenet-performance, area-CodeGen-coreclr

Milestone: -

@EgorBo
Copy link
Member

EgorBo commented Apr 30, 2023

We call it "multi-use boxing" problem, #9118

@EgorBo
Copy link
Member

EgorBo commented Apr 30, 2023

cc @markples

@hez2010
Copy link
Contributor Author

hez2010 commented May 1, 2023

Note that the code in the original post is a trivial case. In real-world cases, there'll be some type tests in the converter to avoid casting exceptions.

For example,

if (value is int d) return d + 1;
else return default(int);

The return value is boxed.

@markples
Copy link
Member

markples commented May 1, 2023

Thank you for the example.

  • The order of the first allocation and type initialization check/call require the allocated object to be spilled. Of course, this is moot if the allocation is removed.
  • The input allocation is unnecessary.
  • The null check on foo (I think?) remains. I suppose readonly+nonnull in cctor could help here.
  • The output allocation is unnecessary, and in the more complicated example there are two allocations, one on each path, making it a bit more complicated. (Well, I assume so.. either the source or JIT could merge them.)

I'll include this in #9118.

@markples markples self-assigned this May 1, 2023
@markples markples removed the untriaged New issue has not been triaged by the area owner label May 1, 2023
@JulieLeeMSFT JulieLeeMSFT added this to the 8.0.0 milestone May 16, 2023
@markples markples added the Priority:2 Work that is important, but not critical for the release label Jun 6, 2023
@markples markples modified the milestones: 8.0.0, 9.0.0 Jul 18, 2023
@AndyAyersMS
Copy link
Member

At tier1 things are a bit simpler, we have

STMT00001 ( 0x002[E-] ... 0x027 )
               [000007] DA---------                         *  STORE_LCL_VAR ref    V02 tmp1         
               [000006] -----------                         \--*  ALLOCOBJ  ref   
               [000005] H----------                            \--*  CNS_INT(h) long   0x7fff521cfbb0 class System.Int32

***** BB01 [0000]
STMT00002 ( ??? ... ??? )
               [000012] DA--G------                         *  STORE_LCL_VAR ref    V03 tmp2         
               [000003] #---G------                         \--*  IND       ref   
               [000002] H----------                            \--*  CNS_INT(h) long   0x258e1c01d80 const ptr Fseq[foo]

***** BB01 [0000]
STMT00003 ( ??? ... ??? )
               [000011] nA--G------                         *  STOREIND  int   
               [000010] -----------                         +--*  ADD       byref 
               [000008] -----------                         |  +--*  LCL_VAR   ref    V02 tmp1         
               [000009] -----------                         |  \--*  CNS_INT   long   8
               [000004] -----------                         \--*  LCL_VAR   int    V00 loc0         

***** BB01 [0000]
STMT00010 ( ??? ... ??? )
               [000056] ---X-------                         *  NULLCHECK byte  
               [000013] -----------                         \--*  LCL_VAR   ref    V03 tmp2         

***** BB01 [0000]
STMT00008 ( INL01 @ 0x000[E-] ... ??? ) <- INLRT @ ???
               [000049] DA---------                         *  STORE_LCL_VAR ref    V06 tmp5         
               [000048] -----------                         \--*  ALLOCOBJ  ref   
               [000047] H----------                            \--*  CNS_INT(h) long   0x7fff521cfbb0 class System.Int32

***** BB01 [0000]
STMT00009 ( INL01 @ ??? ... ??? ) <- INLRT @ ???
               [000053] nA-XG------                         *  STOREIND  int   
               [000052] -----------                         +--*  ADD       byref 
               [000050] -----------                         |  +--*  LCL_VAR   ref    V06 tmp5         
               [000051] -----------                         |  \--*  CNS_INT   long   8
               [000046] ---XG------                         \--*  ADD       int   
               [000044] ---XG------                            +--*  IND       int   
               [000043] -----------                            |  \--*  ADD       byref 
               [000015] -----------                            |     +--*  BOX       ref   
               [000014] -----------                            |     |  \--*  LCL_VAR   ref    V02 tmp1         
               [000042] -----------                            |     \--*  CNS_INT   long   8
               [000045] -----------                            \--*  CNS_INT   int    1


***** BB01 [0000]
STMT00005 ( ??? ... ??? )
               [000025] DAC--------                         *  STORE_LCL_VAR ref    V04 tmp3         
               [000055] -----------                         \--*  BOX       ref   
               [000054] -----------                            \--*  LCL_VAR   ref    V06 tmp5         

***** BB01 [0000]
STMT00006 ( ??? ... ??? )
               [000035] --CXG------                         *  QMARK     void  
               [000029] ---X-------    if                   +--*  EQ        int   
               [000028] #--X-------                         |  +--*  IND       long  
               [000027] -----------                         |  |  \--*  LCL_VAR   ref    V04 tmp3         
               [000024] H----------                         |  \--*  CNS_INT(h) long   0x7fff521cfbb0 class System.Int32
               [000034] --CXG------    if                   \--*  COLON     void  
               [000032] --CXG------ else                       +--*  CALL help void   CORINFO_HELP_UNBOX
               [000031] H---------- arg0                       |  +--*  CNS_INT(h) long   0x7fff521cfbb0 class System.Int32
               [000026] ----------- arg1                       |  \--*  LCL_VAR   ref    V04 tmp3         
               [000033] ----------- then                       \--*  NOP       void  

***** BB01 [0000]
STMT00007 ( ??? ... ??? )
               [000039] ---XG------                         *  RETURN    int   
               [000038] ---XG------                         \--*  IND       int   
               [000037] -----------                            \--*  ADD       byref 
               [000030] -----------                               +--*  LCL_VAR   ref    V04 tmp3         
               [000036] -----------                               \--*  CNS_INT   long   8

Seems like VN ought to be able to resolve [000044] back to the value written in [000011] but it appears that since boxing and unboxing don't use field seqs VN's memory propagation is thwarted. And I suspect this is because we have no good way to represent boxed types or their fields.

If we could do this the first box allocation would become dead, and if we then did something similar at the return, so would the second one.

@EgorBo @jakobbotsch @SingleAccretion any thoughts here..?

@SingleAccretion
Copy link
Contributor

representing field sequences for boxed fields

I don't think it would be hard to add the ability to create 'synthetic' field sequences, such as for fields of boxes, it is 'just work'.

The one aspect we will need to be careful about is making sure the aliasing aspects are correct, i. e. that no synthetic field may alias any non-synthetic field, and that synthetic fields model shared generic code properly. I can see using the value type's type handle as the discriminator working here.

@AndyAyersMS
Copy link
Member

Codegen with #103361: no more boxing, just necessary side effect checks.

; Assembly listing for method Program:Main():int (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX512 - Windows
; FullOpts code
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
;* V00 loc0         [V00    ] (  0,  0   )     int  ->  zero-ref    single-def
;  V01 OutArgs      [V01    ] (  1,  1   )  struct (32) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
;* V02 tmp1         [V02    ] (  0,  0   )    long  ->  zero-ref    class-hnd exact "Single-def Box Helper" <System.Int32>
;  V03 tmp2         [V03,T00] (  2,  4   )     ref  ->  rax         class-hnd exact single-def "impImportAndPushBox" <Foo>
;* V04 tmp3         [V04    ] (  0,  0   )   byref  ->  zero-ref    "inline UNBOX clone1"
;* V05 tmp4         [V05    ] (  0,  0   )     ref  ->  zero-ref    class-hnd exact "Inlining Arg" <System.Int32>
;* V06 tmp5         [V06    ] (  0,  0   )    long  ->  zero-ref    class-hnd exact "Single-def Box Helper" <System.Int32>
;* V07 tmp6         [V07    ] (  0,  0   )  struct (16) zero-ref    do-not-enreg[SF] "stack allocated boxed value class temp" <System.Collections.Generic.KeyValuePair`2[long,int]>
;* V08 tmp7         [V08    ] (  0,  0   )  struct (16) zero-ref    do-not-enreg[SF] "stack allocated boxed value class temp" <System.Collections.Generic.KeyValuePair`2[long,int]>
;* V09 tmp8         [V09    ] (  0,  0   )    long  ->  zero-ref    single-def "V07.[000..008)"
;* V10 tmp9         [V10    ] (  0,  0   )     int  ->  zero-ref    single-def "V07.[008..012)"
;* V11 tmp10        [V11    ] (  0,  0   )    long  ->  zero-ref    single-def "V08.[000..008)"
;* V12 tmp11        [V12    ] (  0,  0   )     int  ->  zero-ref    single-def "V08.[008..012)"
;* V13 cse0         [V13,T01] (  0,  0   )   byref  ->  zero-ref    "CSE #02: aggressive"
;
; Lcl frame size = 40

G_M24375_IG01:  ;; offset=0x0000
       sub      rsp, 40
						;; size=4 bbWeight=1 PerfScore 0.25
G_M24375_IG02:  ;; offset=0x0004
       test     byte  ptr [(reloc 0x7ffe3ba8b4e0)], 1      ; global ptr
       je       SHORT G_M24375_IG05
						;; size=9 bbWeight=1 PerfScore 4.00
G_M24375_IG03:  ;; offset=0x000D
       mov      rax, 0x1EE92800BB8      ; data for Program:foo
       mov      rax, gword ptr [rax]
       cmp      byte  ptr [rax], al
       mov      eax, 4
						;; size=20 bbWeight=1 PerfScore 5.50
G_M24375_IG04:  ;; offset=0x0021
       add      rsp, 40
       ret      
						;; size=5 bbWeight=1 PerfScore 1.25
G_M24375_IG05:  ;; offset=0x0026
       mov      rcx, 0x7FFE3BA8B448      ; Program
       call     CORINFO_HELP_GET_GCSTATIC_BASE
       jmp      SHORT G_M24375_IG03
						;; size=17 bbWeight=0 PerfScore 0.00

; Total bytes of code 55, prolog size 4, PerfScore 11.00, instruction count 12, allocated bytes for code 55 (MethodHash=d9b8a0c8) for method Program:Main():int (FullOpts)
; ============================================================

AndyAyersMS added a commit that referenced this issue Jul 1, 2024
Enable object stack allocation for ref classes and extend the support to include boxed value classes. Use a specialized unbox helper for stack allocated boxes, both to avoid apparent escape of the box by the helper, and to ensure all box field accesses are visible to the JIT. Update the local address visitor to rewrite trees involving address of stack allocated boxes in some cases to avoid address exposure. Disable old promotion for stack allocated boxes (since we have no field handles) and allow physical promotion to enregister the box method table and/or payload as appropriate. In OSR methods handle the fact that the stack allocation may actually have been a heap allocation by the Tier0 method.

The analysis TP cost is around 0.4-0.7% (notes below). Boxes are much less likely to escape than ref classes (roughly ~90% of boxes escape, ~99.8% of ref classes escape). Codegen impact is diminished somewhat because many of the boxes are dead and were already getting optimized away.
 
Fixes #4584, #9118, #10195, #11192, #53585, #58554, #85570

---------

Co-authored-by: Jakob Botsch Nielsen <jakob.botsch.nielsen@gmail.com>
Co-authored-by: Jan Kotas <jkotas@microsoft.com>
@AndyAyersMS
Copy link
Member

Fixed by #103361

@github-actions github-actions bot locked and limited conversation to collaborators Aug 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI Priority:2 Work that is important, but not critical for the release tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

6 participants