Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coredistools failure to disassemble vextracti64x4 #84967

Closed
BruceForstall opened this issue Apr 18, 2023 · 16 comments · Fixed by #85030
Closed

Coredistools failure to disassemble vextracti64x4 #84967

BruceForstall opened this issue Apr 18, 2023 · 16 comments · Fixed by #85030
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI avx512 Related to the AVX-512 architecture
Milestone

Comments

@BruceForstall
Copy link
Member

SuperPMI is reporting an asm diff in:

JIT.HardwareIntrinsics.General._Vector512_1.VectorGetAndWithLowerAndUpper__GetAndWithLowerAndUpperByte:RunBasicScenario():this

but no textual diff, for both x64 and x86. For x64, it is in 236d7997-3d71-45f9-b7d7-5241ad89a56f.windows.x64\coreclr_tests.run.windows.x64.checked.mch.

The superpmi log shows:

[10:34:40] ERROR: Decode Failure Left@ offset 1fb4f77e456
[10:34:40] ERROR: Decode Failure Right@ offset 1fb52ec6816
[10:34:40] ERROR: Decode Failure Left@ offset 1fb4f77e456
[10:34:40] ERROR: Decode Failure Right@ offset 1fb52ec6816

The issue is that coredistools is failing to disassemble:

62D3BD483BF001       vextracti64x4 ymm8, zmm6, 1

coredistools was last updated based on LLVM 13.0.1 in March 2022 with dotnet/jitutils#351 (and related). The most recent version is LLVM 16.0.0 released March 2023 (https://releases.llvm.org/). Does it have a fix for this?

I verified that this disassembles correctly in Visual Studio.

Are there other cases where vextracti64x4 is properly decoded so it's not all cases, just some?

If we can't update coredistools with a fix, we'll have to do something so SuperPMI doesn't display spurious diffs. E.g., filter out the failing MCs.

@dotnet/avx512-contrib @dotnet/jit-contrib

@BruceForstall BruceForstall added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI avx512 Related to the AVX-512 architecture labels Apr 18, 2023
@BruceForstall BruceForstall added this to the 8.0.0 milestone Apr 18, 2023
@BruceForstall BruceForstall self-assigned this Apr 18, 2023
@BruceForstall
Copy link
Member Author

I built a coredistools.dll updated to the latest LLVM, 16.0.1, and this failure persists.

@tannergooding
Copy link
Member

This looks to be an actual bug in the emitted codegen, noting the emitted BD vs the correct FD in the 3rd byte.

Emitted: 62D3BD483BF001
Correct: 62D3FD483BF001 

The technical encoding is: EVEX.512.66.0F3A.W1 3B /r ib. where the EVEX encoding is 0x62 P0 P1 P2:

* P0: R  X  B  R' 0  0  m  m
* P1: W  v  v  v  v  1  p  p
* P2: z  L' L  b  V' a  a  a

We've encoded (where P1E is emitted, P1C is correct):

* P0:  1 1 0 1  0 0 1 1
* P1E: 1 0 1 1  1 1 0 1
* P1C: 1 1 1 1  1 1 0 1
* P2:  0 1 0 0  1 0 0 0

So we're simply misencoding the VEX.vvvv register here. It's meant to be stored in bit inverted format and combined with the V' bit, we've got 1_0111 emitted, which is 8.

The error comes in that vextract doesn't use the vvvv field at all, and so it must be 1_1111. Instead since the instruction is ins reg/mem, reg, imm the mod/rm field is used to store the destination.

It's not immediately obvious where we're going wrong here and we may actually have a related issue with the AVX vextractf128 and vextracti128

@tannergooding
Copy link
Member

tannergooding commented Apr 19, 2023

So we do document the format incorrectly as IF_RRW_RRW_CNS when it should be IF_RWR_RRD_CNS. This is a long-standing issue and something we have wrong for a couple instructions, but one that doesn't cause any problems in practice.

The issue ends up being that the instruction is incorrectly tracking the IsDstDstSrcAVXInstruction flag, which means we double encode what is supposed to be just regFor012Bits also in the regFor3456Bits. This is unique to the Avx512 extract instructions and isn't an issue for the Avx ones after all

Fix should just be removing the flag. We should also probably audit the IsDstDstSrc and IsDstSrcSrc flags for all SIMD instructions. This is mostly needed for the new the AVX512 ones, but we've recently audited the other flags in general, so it wouldn't hurt to finish doing the same here.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Apr 19, 2023
@BruceForstall
Copy link
Member Author

It was confusing because both VS and windbg disassemble this instruction as we expect to see it. Presumably because they ignore the field that is incorrectly encoded?

However, when I run this program (JIT\HardwareIntrinsics\General\Vector512_1\Vector512_1_ro\Vector512_1_ro.dll), with DOTNET_TieredCompilation=0 I get an illegal instruction exception (on both AMD 7950X and Intel Xeon Platinum 8370C).

So is this test case actually running in the CI? It must be if the test ended up in the spmi coreclr_tests.run collection.

Is it running but not correctly reporting the error/failure?

@tannergooding
Copy link
Member

So is this test case actually running in the CI?

The encoding is only incorrect when we have vextracti64x4 reg, reg. If we instead contain it as part of a store and emit vextracti64x4 [addr], reg we don't have any issues.

Up until yesterday we were also only running Vector512 tests in Pri1, so its possible that we weren't seeing this in CI because of that and that it will now be showing up in the outerloop jobs.

@BruceForstall
Copy link
Member Author

The latest outerloop job (from Apr 19, 2023 at 2:00 AM PDT) has a few failures in AVX tests, but not in this one:

https://dev.azure.com/dnceng-public/public/_build/results?buildId=244156&view=ms.vss-test-web.build-test-results-tab

However, when I look at the console log for the HardwareIntrinsics_General_ro test, I see the failure:

11:03:55.549 Running test: _Vector512_1_ro::JIT.HardwareIntrinsics.General._Vector512_1.Program.GetAndWithLowerAndUpperByte()
Beginning scenario: RunBasicScenario
Fatal error. System.Runtime.InteropServices.SEHException (0x80004005): External component has thrown an exception.
   at JIT.HardwareIntrinsics.General._Vector512_1.VectorGetAndWithLowerAndUpper__GetAndWithLowerAndUpperByte.RunBasicScenario()
   at JIT.HardwareIntrinsics.General._Vector512_1.Program.GetAndWithLowerAndUpperByte()
   at Program.<<Main>$>g__TestExecutor26|0_25(System.IO.StreamWriter, System.IO.StreamWriter, <>c__DisplayClass0_0 ByRef)
   at Program.<Main>$(System.String[])
App Exit Code: -1073741795
Expected: 100
Actual: -1073741795
END EXECUTION - FAILED
FAILED
[XUnitLogChecker]: Item 'HardwareIntrinsics_General_ro' did not finish running. Checking and fixing the log...
[XUnitLogChecker]: XUnit log file has been fixed!

1259/2363 tests run.
* 1259 tests passed.
* 0 tests failed.
* 0 tests skipped.

[XUnitLogChecker]: Checking for dumps...
[XUnitLogChecker]: No crash dumps found. Continuing...
[XUnitLogChecker]: Finished!
ERROR: The process "corerun.exe" not found.
2023-04-19T11:03:58.485Z	INFO   	run.py	run(48)	main	Beginning reading of test results.
2023-04-19T11:03:58.485Z	INFO   	run.py	__init__(42)	read_results	Searching 'C:\h\w\B51E0A11\w\AD88096E\e' for test results files
2023-04-19T11:03:58.485Z	INFO   	run.py	__init__(48)	read_results	Found results file C:\h\w\B51E0A11\w\AD88096E\e\JIT\HardwareIntrinsics\HardwareIntrinsics_General_ro\HardwareIntrinsics_General_ro.testResults.xml with format xunit
2023-04-19T11:03:58.501Z	INFO   	run.py	__init__(42)	read_results	Searching 'C:\h\w\B51E0A11\w\AD88096E\uploads' for test results files
2023-04-19T11:03:58.501Z	INFO   	run.py	packing_test_reporter(30)	report_results	Packing 1259 test reports to 'C:\h\w\B51E0A11\w\AD88096E\e\__test_report.json'
2023-04-19T11:03:58.501Z	INFO   	run.py	packing_test_reporter(33)	report_results	Packed 281920 bytes
['HardwareIntrinsics_General_ro' END OF WORK ITEM LOG: Command exited with 0]

https://helixre107v0xdcypoyl9e7f.blob.core.windows.net/dotnet-runtime-refs-heads-main-abd91328c2f94763aa/HardwareIntrinsics_General_ro/1/console.b75ffbb8.log?helixlogtype=result

BUT, notice that even though the test failed, the harness returns 0 -- no failure.

Possibly this is a XUnitLogChecker problem?

=======

btw, the tests are invoked with:

call JIT\HardwareIntrinsics\HardwareIntrinsics_General_ro\HardwareIntrinsics_General_ro.cmd -usewatcher 

I don't have a HardwareIntrinsics_General_ro.cmd in my locally built artifacts, even though I built pri-1 tests. Why?

I DO have artifacts\tests\coreclr\windows.x64.Checked\JIT\HardwareIntrinsics\General\Vector512_1\Vector512_1_ro\Vector512_1_ro.cmd, for example.

@tannergooding
Copy link
Member

Possibly this is a XUnitLogChecker problem?

CC. @trylek are there any known issues here? Could we be missing reported tests failures in some cases?

Perhaps we are looking for # failed and not accounting for tests which didn't run, crashes, bad exit codes, or some timeouts?

@tannergooding
Copy link
Member

I don't have a HardwareIntrinsics_General_ro.cmd in my locally built artifacts, even though I built pri-1 tests. Why?

Did you do anything related to BuildAsStandalone? I think that may stop the merged test wrappers from being built.

@BruceForstall
Copy link
Member Author

Also, @ivdiazsa -- it looks like XUnitLogChecker is ignoring the test run exit code

@BruceForstall
Copy link
Member Author

Did you do anything related to BuildAsStandalone? I think that may stop the merged test wrappers from being built.

Good point. I always build with BuildAsStandalone=true

@BruceForstall
Copy link
Member Author

I opened #85056 to track the CI infrastructure problem (of not reporting this test failure)

@markples
Copy link
Member

This is the case where the failure takes down the entire process. Presumably the test that you're looking for didn't run. We don't have the wrapper logic to attempt to rerun the rest of the test group after the failure. Mitigations are (1) Disable (or mark as RequiresProcessIsolation) catastrophic failures asap to unblock other testing, and (2) keep a balanced merged test group size (# number of 100s) to keep the wins from merging but reduce the collateral damage from these failures.

I see the logic in src/tests/Directory.Build.targets that disables merged test groups under BuildAsStandalone. It's possible that this isn't necessary. Similar to RequiresProcessIsolation, this generates a .cmd/.sh/local entry point, but a merged group can still execute it out-of-process. I guess often it wouldn't be useful since you can run the test standalone.

@tannergooding
Copy link
Member

This is the case where the failure takes down the entire process

Shouldn't the harness try to detect this and mark itself as failing? It looks like we do capture the test process having a bad exit code and that we didn't run everything...

@markples
Copy link
Member

There's an issue where one level needs to report at "success" to avoid helix retrying the job. This isn't an area that I understand - those you've already tagged (plus maybe @jkoritzinsky?) should know more.

@BruceForstall
Copy link
Member Author

Presumably the test that you're looking for didn't run

@markples There are multiple things going on here, but the core problem is the important test here, that took down the merged group, did not get reported as failing (and in fact the entire group did not get reported as failing).

@ivdiazsa
Copy link
Member

Bruce is right. It's a bug with the log checker I'm already looking into. I have no idea why it's overriding the test's exit code. It might have to do with the contents of the HelixCommand script, as the log checker is the last thing to get called.

@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Apr 19, 2023
@ghost ghost locked as resolved and limited conversation to collaborators May 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI avx512 Related to the AVX-512 architecture
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants