Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion failed 'data->GetRegNum() != REG_WRITE_BARRIER_DST' #77143

Closed
markples opened this issue Oct 18, 2022 · 6 comments · Fixed by #81423
Closed

Assertion failed 'data->GetRegNum() != REG_WRITE_BARRIER_DST' #77143

markples opened this issue Oct 18, 2022 · 6 comments · Fixed by #81423
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@markples
Copy link
Member

runtime-coreclr libraries-pgo
Libraries Test Run checked coreclr Linux x64 Release

This has hit at least twice in System.Runtime.Serialization.Xml.ReflectionOnly.Tests, but that test run usually hits #75827.

https://dev.azure.com/dnceng-public/public/_build/results?buildId=52661&view=results
https://helix.dot.net/api/2019-06-17/jobs/6e3b9174-8eec-4bbf-8302-0033b34d4120/workitems/System.Runtime.Serialization.Xml.ReflectionOnly.Tests/console

https://dev.azure.com/dnceng-public/public/_build/results?buildId=52628&view=results
https://helix.dot.net/api/2019-06-17/jobs/fd3be114-dde0-4513-92ce-def4be1c2535/workitems/System.Runtime.Serialization.Xml.ReflectionOnly.Tests/console

+ grep COMPlus
+ printenv
COMPlus_JitRandomlyCollect64BitCounts=1
COMPlus_ReadyToRun=0
COMPlus_TC_QuickJitForLoops=1
COMPlus_JitRandomGuardedDevirtualization=1
COMPlus_TieredCompilation=1
COMPlus_JitRandomEdgeCounts=1
COMPlus_DbgMiniDumpName=/home/helixbot/dotnetbuild/dumps/coredump.%d.dmp
COMPlus_TieredPGO=1
COMPlus_DbgEnableMiniDump=1
...
  Starting:    System.Runtime.Serialization.Xml.ReflectionOnly.Tests (parallel test collections = on, max threads = 2)

Assert failure(PID 11707 [0x00002dbb], Thread: 11713 [0x2dc1]): Assertion failed 'data->GetRegNum() != REG_WRITE_BARRIER_DST' in 'System.Runtime.Serialization.XmlObjectSerializerWriteContext:SerializeWithoutXsiType(System.Runtime.Serialization.DataContracts.DataContract,System.Runtime.Serialization.XmlWriterDelegator,System.Object,System.RuntimeTypeHandle):this' during 'Generate code' (IL size 87; hash 0x9f1fa2ca; Tier1)

    File: /__w/1/s/src/coreclr/jit/codegenxarch.cpp Line: 5144
    Image: /datadisks/disk1/work/ADF20904/p/dotnet

cc @dotnet/jit-contrib

@markples markples added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI blocking-clean-ci-optional Blocking optional rolling runs labels Oct 18, 2022
@markples markples added this to the 8.0.0 milestone Oct 18, 2022
@ghost
Copy link

ghost commented Oct 18, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

runtime-coreclr libraries-pgo
Libraries Test Run checked coreclr Linux x64 Release

This has hit at least twice in System.Runtime.Serialization.Xml.ReflectionOnly.Tests, but that test run usually hits #75827.

https://dev.azure.com/dnceng-public/public/_build/results?buildId=52661&view=results
https://helix.dot.net/api/2019-06-17/jobs/6e3b9174-8eec-4bbf-8302-0033b34d4120/workitems/System.Runtime.Serialization.Xml.ReflectionOnly.Tests/console

https://dev.azure.com/dnceng-public/public/_build/results?buildId=52628&view=results
https://helix.dot.net/api/2019-06-17/jobs/fd3be114-dde0-4513-92ce-def4be1c2535/workitems/System.Runtime.Serialization.Xml.ReflectionOnly.Tests/console

+ grep COMPlus
+ printenv
COMPlus_JitRandomlyCollect64BitCounts=1
COMPlus_ReadyToRun=0
COMPlus_TC_QuickJitForLoops=1
COMPlus_JitRandomGuardedDevirtualization=1
COMPlus_TieredCompilation=1
COMPlus_JitRandomEdgeCounts=1
COMPlus_DbgMiniDumpName=/home/helixbot/dotnetbuild/dumps/coredump.%d.dmp
COMPlus_TieredPGO=1
COMPlus_DbgEnableMiniDump=1
...
  Starting:    System.Runtime.Serialization.Xml.ReflectionOnly.Tests (parallel test collections = on, max threads = 2)

Assert failure(PID 11707 [0x00002dbb], Thread: 11713 [0x2dc1]): Assertion failed 'data->GetRegNum() != REG_WRITE_BARRIER_DST' in 'System.Runtime.Serialization.XmlObjectSerializerWriteContext:SerializeWithoutXsiType(System.Runtime.Serialization.DataContracts.DataContract,System.Runtime.Serialization.XmlWriterDelegator,System.Object,System.RuntimeTypeHandle):this' during 'Generate code' (IL size 87; hash 0x9f1fa2ca; Tier1)

    File: /__w/1/s/src/coreclr/jit/codegenxarch.cpp Line: 5144
    Image: /datadisks/disk1/work/ADF20904/p/dotnet

cc @dotnet/jit-contrib

Author: markples
Assignees: -
Labels:

area-CodeGen-coreclr, blocking-clean-ci-optional

Milestone: 8.0.0

@jakobbotsch
Copy link
Member

Haven't seen this recently. This and #77141 might have been fixed by something. Will removing blocking label for now and keep monitoring.

@jakobbotsch jakobbotsch removed the blocking-clean-ci-optional Blocking optional rolling runs label Nov 1, 2022
@jakobbotsch
Copy link
Member

This reproed again this Sunday:
https://dev.azure.com/dnceng-public/public/_build/results?buildId=152015&view=ms.vss-test-web.build-test-results-tab&runId=3204063&resultId=196116&paneView=debug

Unfortunately I still haven't been able to get a repro of this.

@jakobbotsch
Copy link
Member

jakobbotsch commented Jan 31, 2023

I was finally able to get a repro by setting DOTNET_TC_AggressiveTiering=1 in addition to all the other environment variables, and then running this in a loop for a bit.

Attached an SPMI context that repros it with unix-x64 JIT on commit dfe1076.

repro-27528.zip

@jakobbotsch
Copy link
Member

Seems like LSRA gives an illegal register assignment:

N3111 (  3, 10) [003011] H---------Z                 t3011 =    CNS_INT(h) ref    REG rsi $b16
                                                            ┌──▌  t3011  ref    
               [004437] -----------                 t4437 = ▌  RELOAD    ref    REG rdi
N3113 (  3, 10) [001640] H----------                 t1640 =    CNS_INT(h) long   0x7f7ee0008440 static Fseq[<unknown field>] REG rsi $b14
                                                            ┌──▌  t1640  long   
                                                            ├──▌  t4437  ref    
N3115 (???,???) [003877] -A--G------                         ▌  STOREIND  ref    REG NA

The address is t1640, i.e. a constant. It is put into rsi, while the data, t4437, is put into rdi. But the write needs a write barrier, and the write barrier expects the address in rdi and the data in rsi.

@jakobbotsch
Copy link
Member

The problem is that gcIsWriteBarrierCandidate is expected to return the same results in LSRA and codegen, yet the checks on the data do not skip LSRA inserted COPY/RELOAD nodes. Will submit a fix.
As a side note the register allocation is very odd here (why is LSRA even inserting a RELOAD on top of a constant, right after its def?). After the fix we end up with this very subpar codegen:

       48BEE8086F5BBF7F0000 mov      rsi, 0x7FBF5B6F08E8
       4889B570FEFFFF       mov      gword ptr [rbp-190H], rsi
       48BE408400E07E7F0000 mov      rsi, 0x7F7EE0008440      ; data for <unknown class>:<unknown field>
       488BBD70FEFFFF       mov      rdi, gword ptr [rbp-190H]
       48893E               mov      gword ptr [rsi], rdi

cc @kunalspathak, the SPMI context above provides the repro case. There's some reuse of constants, maybe that's related.

[003011] 3112.#2455 C501  Def    Alloc    rsi  │     │V116a│     │V4  a│     │     │     │     │     │V117aV2  a│V34 a│V3  a│V1  a│
                                 Spill    rsi  │     │V116a│     │V4  a│     │     │     │     │     │V117aV2  a│V34 a│V3  a│V1  a│
[001640] 3114.#2456 C502  Def    Alloc    rsi  │     │V116a│     │V4  a│C502a│     │     │     │     │V117aV2  a│V34 a│V3  a│V1  a│
[003877] 3115.#2457 C502  Use *  Keep     rsi  │     │V116a│     │V4  a│C502i│     │     │     │     │V117aV2  a│V34 a│V3  a│V1  a│
         3115.#2458 C501  Use *  ReLod    rdi  │     │V116a│     │V4  a│     │C501a│     │     │     │V117aV2  a│V34 a│V3  a│V1  a│
                                 Keep     rdi  │     │V116a│     │V4  a│     │C501i│     │     │     │V117aV2  a│V34 a│V3  a│V1  a│
[003012] 3120.#2459 C503  Def    Reuse    rdi  │     │V116a│     │V4  a│     │C503a│     │     │     │V117aV2  a│V34 a│V3  a│V1  a│
[001644] 3121.#2460 C503  Use *  Keep     rdi  │     │V116a│     │V4  a│     │C503i│     │     │     │V117aV2  a│V34 a│V3  a│V1  a│

jakobbotsch added a commit to jakobbotsch/runtime that referenced this issue Jan 31, 2023
gcIsWriteBarrierCandidate is expected to return the same results during
LSRA and during codegen, so it needs to skip GT_COPY and GT_RELOAD
inserted on top of the data node.

Fix dotnet#77141
Fix dotnet#77143
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jan 31, 2023
jakobbotsch added a commit that referenced this issue Jan 31, 2023
gcIsWriteBarrierCandidate is expected to return the same results during
LSRA and during codegen, so it needs to skip GT_COPY and GT_RELOAD
inserted on top of the data node.

Fix #77141
Fix #77143
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jan 31, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Mar 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants