-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coredistools failure to disassemble vextracti64x4 #84967
Comments
I built a coredistools.dll updated to the latest LLVM, 16.0.1, and this failure persists. |
This looks to be an actual bug in the emitted codegen, noting the emitted
The technical encoding is:
We've encoded (where P1E is emitted, P1C is correct):
So we're simply misencoding the The error comes in that It's not immediately obvious where we're going wrong here and we may actually have a related issue with the AVX |
So we do document the format incorrectly as The issue ends up being that the instruction is incorrectly tracking the Fix should just be removing the flag. We should also probably audit the |
It was confusing because both VS and windbg disassemble this instruction as we expect to see it. Presumably because they ignore the field that is incorrectly encoded? However, when I run this program (JIT\HardwareIntrinsics\General\Vector512_1\Vector512_1_ro\Vector512_1_ro.dll), with So is this test case actually running in the CI? It must be if the test ended up in the spmi coreclr_tests.run collection. Is it running but not correctly reporting the error/failure? |
The encoding is only incorrect when we have Up until yesterday we were also only running Vector512 tests in Pri1, so its possible that we weren't seeing this in CI because of that and that it will now be showing up in the outerloop jobs. |
The latest outerloop job (from Apr 19, 2023 at 2:00 AM PDT) has a few failures in AVX tests, but not in this one: However, when I look at the console log for the HardwareIntrinsics_General_ro test, I see the failure:
BUT, notice that even though the test failed, the harness returns 0 -- no failure. Possibly this is a XUnitLogChecker problem? ======= btw, the tests are invoked with:
I don't have a I DO have artifacts\tests\coreclr\windows.x64.Checked\JIT\HardwareIntrinsics\General\Vector512_1\Vector512_1_ro\Vector512_1_ro.cmd, for example. |
CC. @trylek are there any known issues here? Could we be missing reported tests failures in some cases? Perhaps we are looking for |
Did you do anything related to |
Also, @ivdiazsa -- it looks like XUnitLogChecker is ignoring the test run exit code |
Good point. I always build with |
I opened #85056 to track the CI infrastructure problem (of not reporting this test failure) |
This is the case where the failure takes down the entire process. Presumably the test that you're looking for didn't run. We don't have the wrapper logic to attempt to rerun the rest of the test group after the failure. Mitigations are (1) Disable (or mark as RequiresProcessIsolation) catastrophic failures asap to unblock other testing, and (2) keep a balanced merged test group size (# number of 100s) to keep the wins from merging but reduce the collateral damage from these failures. I see the logic in src/tests/Directory.Build.targets that disables merged test groups under BuildAsStandalone. It's possible that this isn't necessary. Similar to RequiresProcessIsolation, this generates a .cmd/.sh/local entry point, but a merged group can still execute it out-of-process. I guess often it wouldn't be useful since you can run the test standalone. |
Shouldn't the harness try to detect this and mark itself as failing? It looks like we do capture the test process having a bad exit code and that we didn't run everything... |
There's an issue where one level needs to report at "success" to avoid helix retrying the job. This isn't an area that I understand - those you've already tagged (plus maybe @jkoritzinsky?) should know more. |
@markples There are multiple things going on here, but the core problem is the important test here, that took down the merged group, did not get reported as failing (and in fact the entire group did not get reported as failing). |
Bruce is right. It's a bug with the log checker I'm already looking into. I have no idea why it's overriding the test's exit code. It might have to do with the contents of the HelixCommand script, as the log checker is the last thing to get called. |
SuperPMI is reporting an asm diff in:
but no textual diff, for both x64 and x86. For x64, it is in 236d7997-3d71-45f9-b7d7-5241ad89a56f.windows.x64\coreclr_tests.run.windows.x64.checked.mch.
The superpmi log shows:
The issue is that coredistools is failing to disassemble:
coredistools was last updated based on LLVM 13.0.1 in March 2022 with dotnet/jitutils#351 (and related). The most recent version is LLVM 16.0.0 released March 2023 (https://releases.llvm.org/). Does it have a fix for this?
I verified that this disassembles correctly in Visual Studio.
Are there other cases where vextracti64x4 is properly decoded so it's not all cases, just some?
If we can't update coredistools with a fix, we'll have to do something so SuperPMI doesn't display spurious diffs. E.g., filter out the failing MCs.
@dotnet/avx512-contrib @dotnet/jit-contrib
The text was updated successfully, but these errors were encountered: