-
Notifications
You must be signed in to change notification settings - Fork 12.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High failure rate on Windows MSVC CI with filesystem errors #127883
Comments
Do you have a link to a failed pipeline run? |
You can also copy any of the |
|
Another: #128982 (comment) |
Ok so the problem is that deleting a file (ultimately using
In my tests, the underlying error is |
Cargo sets FILE_ATTRIBUTE_NOT_CONTENT_INDEXED on the target dir, but it looks like bootstrap doesn't for the build dir. I hope github doesn't enable indexing on the CI machines, but if they do then maybe that'd help. |
I generated a process monitor .pml file. It can opened using procmon64. However, I'm not yet seeing any obvious cause of the |
Affects my PR #129019 so I thought I'd take a gander at the PML shared here. There were two One of them happened before any [1]:
|
If the Handles tool doesn't reveal anything, a live kernel dump file a PS helper if no GUI installed or Task Manager (right click on the PID 4 and then "create kernel dump") would be the last resort. Goes w/o saying, the kernel dump should be shared if and only if the machine doesn't carry any sensitive information. |
The CI machines probably have sensitive authorization tokens somewhere in their memory, unfortunately. Though T-infra is pretty diligent about the principle of least privilege, they still need to move some data around to some fairly specific buckets. |
We have tried using |
#130569 (comment) failure has a case where:
in addition to a different failure
so there's definitely something stepping over something |
Starting around 2024-06-27, the rust-lang CI has started to encounter a very high failure rate on the MSVC Windows builders (~15% of all builds?). These builders are encountering various filesystems errors, such as "Access is denied", "used by another process", "cannot open file", etc.
Zulip discussion: https://rust-lang.zulipchat.com/#narrow/stream/242791-t-infra/topic/Spurious.20CI.20errors.20on.20x86_64-msvc-ext
The following is a sample of the errors seen:
Starting 2024-07-02, #127152 added a mitigation measure in bootstrap to cover the most common culprit of bootstrap attempting to delete an executable. However, there are several other programs that are still having problems, such as rustc itself, the msvc linker, and cargo.
I have tried using
handle.exe
, and the RestartManager API to try to detect if there is another process with an open handle on a file, but no success.I have tried rolling back the source tree to 2024-06-25, and it still reproduces the problem (before we started seeing it in CI).
The last Windows image release before this started was https://github.com/actions/runner-images/releases/tag/win22%2F20240618.1. From what I can tell, this rolled out about 5 days earlier, during which there weren't any failures, but it is difficult to tell if that could be related.
The last stage0 bump was 2024-06-11, several weeks before it started.
actions/runner-images#4086 is a similar issue we've had in the past, though we don't know what the fix was.
From what I can tell, this doesn't seem to be affecting the GNU builders.
The text was updated successfully, but these errors were encountered: