Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.IO.Net5Compat.Tests and System.IO.Tests suddenly exiting with error 137 #100558

Open
carlossanlop opened this issue Apr 2, 2024 · 14 comments
Assignees
Labels
arch-x64 area-System.IO Known Build Error Use this to report build issues in the .NET Helix tab os-linux Linux OS (any supported distro)
Milestone

Comments

@carlossanlop
Copy link
Member

carlossanlop commented Apr 2, 2024

The System.IO.Net5Compat.Tests and the System.IO.Tests test processes are intermittengly getting killed on Linux shortly after starting, and the exit code is 137.

Build Information

Build: https://dev.azure.com/dnceng-public/public/_build/results?buildId=627407
Build error leg or test failing: System.IO.Net5Compat.Tests

Error Message

{
  "ErrorPattern": ["Starting:    System\\.IO\\.(Net5Compat\\.)?Tests", "exit code 137"],
  "BuildRetry" : true,
  "ExcludeConsoleLog" : false
}

System.IO.Net5Compat.Tests example

===========================================================================================================
/root/helix/work/workitem/e /root/helix/work/workitem/e
  Discovering: System.IO.Net5Compat.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.IO.Net5Compat.Tests (found 679 of 685 test cases)
  Starting:    System.IO.Net5Compat.Tests (parallel test collections = on, max threads = 2)
./RunTests.sh: line 162:    25 Killed                  "$RUNTIME_PATH/dotnet" exec --runtimeconfig System.IO.Net5Compat.Tests.runtimeconfig.json --depsfile System.IO.Net5Compat.Tests.deps.json xunit.console.dll System.IO.Net5Compat.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing $RSP_FILE
/root/helix/work/workitem/e
----- end Tue Apr 2 20:20:02 UTC 2024 ----- exit code 137 ----------------------------------------------------------

System.IO.Test example

===========================================================================================================
/root/helix/work/workitem/e /root/helix/work/workitem/e
  Discovering: System.IO.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.IO.Tests (found 679 of 685 test cases)
  Starting:    System.IO.Tests (parallel test collections = on, max threads = 2)
./RunTests.sh: line 162:    25 Killed                  "$RUNTIME_PATH/dotnet" exec --runtimeconfig System.IO.Tests.runtimeconfig.json --depsfile System.IO.Tests.deps.json xunit.console.dll System.IO.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing $RSP_FILE
/root/helix/work/workitem/e
----- end Tue Apr 2 20:20:10 UTC 2024 ----- exit code 137 ----------------------------------------------------------

Known issue validation

Build: 🔎 https://dev.azure.com/dnceng-public/public/_build/results?buildId=627407
Error message validated: [Starting: System\.IO\.(Net5Compat\.)?Tests exit code 137]
Result validation: ✅ Known issue matched with the provided build.
Validation performed at: 4/2/2024 11:08:28 PM UTC

Report

Build Definition Test Pull Request
803018 dotnet/runtime System.IO.Tests.WorkItemExecution #107592
808544 dotnet/runtime System.IO.Tests.WorkItemExecution #107826
806951 dotnet/runtime System.IO.Tests.WorkItemExecution #107764
806827 dotnet/runtime System.IO.Tests.WorkItemExecution #107757
2535358 dotnet-runtime System.IO.Tests.WorkItemExecution #42716
2534129 dotnet-runtime System.IO.Tests.WorkItemExecution #42427
791460 dotnet/runtime System.IO.Tests.WorkItemExecution #107102
798942 dotnet/runtime System.IO.Tests.WorkItemExecution
798165 dotnet/runtime System.IO.Tests.WorkItemExecution #107382
798004 dotnet/runtime System.IO.Tests.WorkItemExecution
797902 dotnet/runtime System.IO.Tests.WorkItemExecution
797874 dotnet/runtime System.IO.Tests.WorkItemExecution #107039
797671 dotnet/runtime System.IO.Tests.WorkItemExecution #107249
794247 dotnet/runtime System.IO.Tests.WorkItemExecution #107147
797588 dotnet/runtime System.IO.Tests.WorkItemExecution #107359
797418 dotnet/runtime System.IO.Tests.WorkItemExecution #106923
797398 dotnet/runtime System.IO.Tests.WorkItemExecution #106922
796949 dotnet/runtime System.IO.Tests.WorkItemExecution
794502 dotnet/runtime System.IO.Tests.WorkItemExecution #107220
796803 dotnet/runtime System.IO.Tests.WorkItemExecution #107321
796737 dotnet/runtime System.IO.Tests.WorkItemExecution #107314
796270 dotnet/runtime System.IO.Tests.WorkItemExecution #106924
796256 dotnet/runtime System.IO.Tests.WorkItemExecution #106922
792543 dotnet/runtime System.IO.Tests.WorkItemExecution #106403
795379 dotnet/runtime System.IO.Tests.WorkItemExecution #107249
794842 dotnet/runtime System.IO.Tests.WorkItemExecution #106924
794838 dotnet/runtime System.IO.Tests.WorkItemExecution #106923
794565 dotnet/runtime System.IO.Tests.WorkItemExecution #106725
794494 dotnet/runtime System.IO.Tests.WorkItemExecution
794124 dotnet/runtime System.IO.Tests.WorkItemExecution #107203
793852 dotnet/runtime System.IO.Tests.WorkItemExecution #106924
793828 dotnet/runtime System.IO.Tests.WorkItemExecution #106961
793027 dotnet/runtime System.IO.Tests.WorkItemExecution #107147
793300 dotnet/runtime System.IO.Tests.WorkItemExecution #107168
793208 dotnet/runtime System.IO.Tests.WorkItemExecution #107116
793190 dotnet/runtime System.IO.Tests.WorkItemExecution #107156
793084 dotnet/runtime System.IO.Tests.WorkItemExecution
792978 dotnet/runtime System.IO.Tests.WorkItemExecution
792934 dotnet/runtime System.IO.Tests.WorkItemExecution #107155
792830 dotnet/runtime System.IO.Tests.WorkItemExecution #107152
792661 dotnet/runtime System.IO.Tests.WorkItemExecution #107075
792667 dotnet/runtime System.IO.Tests.WorkItemExecution
792675 dotnet/runtime System.IO.Tests.WorkItemExecution #107093
792580 dotnet/runtime System.IO.Tests.WorkItemExecution #107116
792557 dotnet/runtime System.IO.Tests.WorkItemExecution #107138
791056 dotnet/runtime System.IO.Tests.WorkItemExecution #106965
791294 dotnet/runtime System.IO.Tests.WorkItemExecution #107093
792370 dotnet/runtime System.IO.Tests.WorkItemExecution #107133
792328 dotnet/runtime System.IO.Tests.WorkItemExecution #107027
792273 dotnet/runtime System.IO.Tests.WorkItemExecution #107036
792222 dotnet/runtime System.IO.Tests.WorkItemExecution #107096
792163 dotnet/runtime System.IO.Tests.WorkItemExecution #107079
792088 dotnet/runtime System.IO.Tests.WorkItemExecution #107126
792074 dotnet/runtime System.IO.Tests.WorkItemExecution
792059 dotnet/runtime System.IO.Tests.WorkItemExecution #107124
791252 dotnet/runtime System.IO.Tests.WorkItemExecution #106403
792002 dotnet/runtime System.IO.Tests.WorkItemExecution #107117
791936 dotnet/runtime System.IO.Tests.WorkItemExecution #107059
791902 dotnet/runtime System.IO.Tests.WorkItemExecution
791827 dotnet/runtime System.IO.Tests.WorkItemExecution #107117
791806 dotnet/runtime System.IO.Tests.WorkItemExecution #107115
791750 dotnet/runtime System.IO.Tests.WorkItemExecution #106599
791216 dotnet/runtime System.IO.Tests.WorkItemExecution #107085
791664 dotnet/runtime System.IO.Tests.WorkItemExecution #106787
791605 dotnet/runtime System.IO.Tests.WorkItemExecution #107101
791633 dotnet/runtime System.IO.Tests.WorkItemExecution #107109
791534 dotnet/runtime System.IO.Tests.WorkItemExecution
791491 dotnet/runtime System.IO.Tests.WorkItemExecution #107028
791488 dotnet/runtime System.IO.Tests.WorkItemExecution #107027
791487 dotnet/runtime System.IO.Tests.WorkItemExecution #106875
791484 dotnet/runtime System.IO.Tests.WorkItemExecution #106873
789647 dotnet/runtime System.IO.Tests.WorkItemExecution #106881
788962 dotnet/runtime System.IO.Tests.WorkItemExecution #106985
790678 dotnet/runtime System.IO.Tests.WorkItemExecution #104487
790653 dotnet/runtime System.IO.Tests.WorkItemExecution #107064
790626 dotnet/runtime System.IO.Tests.WorkItemExecution #106854
2525445 dotnet-runtime System.IO.Tests.WorkItemExecution #42161
790560 dotnet/runtime System.IO.Tests.WorkItemExecution #106563
790537 dotnet/runtime System.IO.Tests.WorkItemExecution #106994
790524 dotnet/runtime System.IO.Tests.WorkItemExecution #107058
2525441 dotnet-runtime System.IO.Net5Compat.Tests.WorkItemExecution #42160
790459 dotnet/runtime System.IO.Tests.WorkItemExecution
790349 dotnet/runtime System.IO.Tests.WorkItemExecution #106988
790346 dotnet/runtime System.IO.Tests.WorkItemExecution #107028
790260 dotnet/runtime System.IO.Tests.WorkItemExecution #106854
790162 dotnet/runtime System.IO.Tests.WorkItemExecution #106633
790152 dotnet/runtime System.IO.Tests.WorkItemExecution #106599
790300 dotnet/runtime System.IO.Tests.WorkItemExecution #106972
790294 dotnet/runtime System.IO.Tests.WorkItemExecution #106971
790315 dotnet/runtime System.IO.Tests.WorkItemExecution #107004
790074 dotnet/runtime System.IO.Tests.WorkItemExecution #107038
788155 dotnet/runtime System.IO.Tests.WorkItemExecution #106951
790039 dotnet/runtime System.IO.Tests.WorkItemExecution #107035
790013 dotnet/runtime System.IO.Tests.WorkItemExecution
789864 dotnet/runtime System.IO.Tests.WorkItemExecution #106924
789294 dotnet/runtime System.IO.Tests.WorkItemExecution #107004
789853 dotnet/runtime System.IO.Tests.WorkItemExecution #106923
789842 dotnet/runtime System.IO.Tests.WorkItemExecution #106961
789836 dotnet/runtime System.IO.Tests.WorkItemExecution #106922
789140 dotnet/runtime System.IO.Tests.WorkItemExecution
Displaying 100 of 223 results

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
1 2 223
@carlossanlop carlossanlop added area-System.IO arch-x64 runtime-coreclr specific to the CoreCLR runtime os-linux-musl Linux distributions using musl library. Known Build Error Use this to report build issues in the .NET Helix tab labels Apr 2, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Apr 2, 2024
@carlossanlop carlossanlop changed the title System.IO.Net5Compat.Tests suddenly exiting with error 137 System.IO.Net5Compat.Tests and System.IO.Tests suddenly exiting with error 137 Apr 2, 2024
@carlossanlop carlossanlop added os-linux Linux OS (any supported distro) runtime-mono specific to the Mono runtime labels Apr 2, 2024
@ericstj ericstj removed os-linux Linux OS (any supported distro) os-linux-musl Linux distributions using musl library. runtime-mono specific to the Mono runtime runtime-coreclr specific to the CoreCLR runtime labels Apr 12, 2024
@ericstj
Copy link
Member

ericstj commented Apr 12, 2024

@dotnet/area-system-io there are a lot of hits on this and relatively recent. It seems to me to be happening across many configurations. I think it's worth having a look.

@adamsitnik
Copy link
Member

The System.IO.Net5Compat.Tests and the System.IO.Tests test processes are intermittengly getting killed on Linux shortly after starting, and the exit code is 137.

137 means out of memory. We have not made any changes to 6.0 in System.IO, so I expect that either there was some infra change (like less memory available) or a bug was introduced in the product itself. The bug would be specific to Linux.

@carlossanlop is it possible to perform some kind of binary search based on the merged PRs and when it started to fail?

@carlossanlop
Copy link
Member Author

@adamsitnik @jozkee This is one of the most impactful failures in servicing. It only affects System.IO.Tests and System.IO.Net5Compat.Tests. Any chance you can take a look soon?

@adamsitnik
Copy link
Member

@carlossanlop sure, but could you please answer the question I've asked in #100558 (comment) ?

@carlossanlop
Copy link
Member Author

Sorry, I missed that question. Yes, you can use Kusto. David has used it many times in the past.

@carlossanlop
Copy link
Member Author

This is the super basic kusto query you can execute if looking via issue:

TestKnownIssues
| union KnownIssues
| where IssueId == ""

This database stores data from the last 4 months so hopefully there's still info from April.

This is the cluster where you would look for that info: https://dataexplorer.azure.com/clusters/dotnetperf.westus/databases/PerformanceData

Unfortunately it seems that failure data is not stored if it's not linked to an issue.

Thanks @AlitzelMendez for the above info.

@adamsitnik adamsitnik modified the milestones: 9.0.0, 10.0.0 Aug 21, 2024
@jeffhandley
Copy link
Member

This test is failing a lot with 33 hits over the past 24 hours. We need to bring this back into 9.0.0, get it resolved, and plan to backport whatever change we make to the release/9.0 branch to clean up the failures there.

@vcsjones
Copy link
Member

I suspect it is this test

[ConditionalTheory(typeof(PlatformDetection), nameof(PlatformDetection.Is64BitProcess))]
[MemberData(nameof(MemoryStream_PositionOverflow_Throws_MemberData))]
[SkipOnPlatform(TestPlatforms.iOS | TestPlatforms.tvOS, "https://github.com/dotnet/runtime/issues/92467")]
[ActiveIssue("https://github.com/dotnet/runtime/issues/100225", typeof(PlatformDetection), nameof(PlatformDetection.IsMonoRuntime), nameof(PlatformDetection.IsWindows), nameof(PlatformDetection.IsX64Process))]
public void MemoryStream_SeekOverflow_Throws(SeekMode mode, int bufferSize, int origin)

It is already disabled and noted to be problematic in certain environments. I don't know how much memory the ADO containers have, but this test does a couple of 2GB allocations.

I suspect you are just hitting the CoreCLR version of this Mono failure. #100225

@vcsjones
Copy link
Member

That is the only test in System.IO.Tests that does any significant memory allocation that I was able to observe.

@adamsitnik
Copy link
Member

this test does a couple of 2GB allocations.

It's most likely one of the tests that causes the OOM 👍

But I am not sure that it's the only one:

@vcsjones
Copy link
Member

this should manifest as a managed OOM that does not take the testing app down?

This test failure looks like the Linux OOM killer. The .NET process was able to allocate memory, but Linux shortly later ran out of memory. When that happens, Linux runs the OOM killer to start taking processes.

See https://www.kernel.org/doc/gorman/html/understand/understand016.html for more information.

The OOM killer decided that the .NET process was the right one to take down.

@adamsitnik
Copy link
Member

@vcsjones thanks, I was not aware of that! (BTW it sucks as in a way it hides quite important information like stacktrace of the method that caused OOM)

@vcsjones
Copy link
Member

vcsjones commented Sep 1, 2024

I think that test was contributing to the problem. The issue is still occurring in the release/9.0 branch which is why the 24-hour cell is not zero.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-x64 area-System.IO Known Build Error Use this to report build issues in the .NET Helix tab os-linux Linux OS (any supported distro)
Projects
None yet
Development

No branches or pull requests

6 participants