Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crossgen'ed powershell assemblies cause an intermittent hang when running our tests on Linux and OSX #7495

Closed
daxian-dbw opened this issue Feb 23, 2017 · 10 comments
Labels
area-CrossGen/NGEN-coreclr area-ReadyToRun-coreclr blocking Marks issues that we want to fast track in order to unblock other important work
Milestone

Comments

@daxian-dbw
Copy link
Contributor

We found the crossgen’ed powershell assemblies (targeting .NET Core) cause an intermittent hang when running our powershell class basic parsing tests on Linux/OSX. We have been seeing this issue in our Travis CI builds for some time. I tried to reproduce it locally by running the tests in a loop and found it only reproducible with the crossgen’ed assemblies (see an example screenshot below).

Repro step

  1. Install powershell_6.0.0-alpha.16-1ubuntu1.14.04.1_amd64.deb on a Ubuntu.14.04-x64 machine. The powershell assemblies from that package were crossgen'ed. And the "crossgen" executable used was from ".nuget/packages/runtime.ubuntu.14.04-x64.Microsoft.NETCore.Runtime.CoreCLR/1.1.0/tools/crossgen"
  2. Download the attached "tests.tar.gz", decompress it to get the "tests" folder, and run powershell -command 'foreach ($i in 1..20) { Invoke-Pester tests/Scripting.Classes.BasicParsing.Tests.ps1 }'. This command basically runs that test file in a loop for 20 times, and a hang will happen for most of the time.

Things worth mentioning

  1. I couldn't reproduce the hang when using IL assemblies on Linux and OSX.
  2. The crossgen’ed assemblies for Windows work fine. I never saw this hang happen in our AppVeyor CI builds. The hang only happens on Linux and OSX.
  3. Those powershell class parsing tests generate a lot of dynamic assemblies. PowerShell class is basically CLR types – powershell emits types and creates dynamic assemblies when parsing a powershell class in a script.

tests.tar.gz

hang

@SteveL-MSFT
Copy link
Contributor

@rahku any possibility we can get this fixed in 2.0.0 servicing?

@jkotas jkotas unassigned rahku Oct 9, 2017
@jkotas
Copy link
Member

jkotas commented Oct 9, 2017

@rahku does not work on CoreCLR anymore.

cc @sergiy-k @russellhadley

@iSazonov
Copy link
Contributor

iSazonov commented Oct 9, 2017

I'm surprised it's still not addressed. It would be unpleasant if some Azure service using PowerShell Core were to be freeze.
This is all the more terrible to the users that even we can't get a dump.

@adityamandaleeka
Copy link
Member

adityamandaleeka commented Oct 12, 2017

I took a look at this today and was able to reproduce the issue. I think I've figured out why this hang is occurring.

When the LoaderAllocator gets destroyed, it calls LoaderAllocator::GCLoaderAllocators which tries to delete unreferenced domain assemblies. The destructor for Assembly calls Assembly::Terminate which suspends the EE and then calls ExecutionManager::Unload. At this point, the ExecutionManager tries to delete code heaps, but in order to do so, it must acquire a writer lock (which in turn requires that there are no more readers active). This thread is stuck waiting here because...

...on another thread, System.Management.Automation.LocationGlobber.ExpandMshGlobPath threw an ItemNotFoundException, and we're in the process of dispatching that exception. One of the first things we need to do is unwind to the first managed call frame. This means checking if the code is managed and to do so, we first acquire the ExecutionManager's reader lock (because the scan flags for that thread tell us we need to). Now comes the part where ReadyToRun comes in: while we're holding the lock, we call JitCodeToMethodInfo, and the ReadyToRun version of that (ReadyToRunJitManager::JitCodeToMethodInfo) calls ReadyToRunInfo::GetMethodDescForEntryPoint which tries to do a hashmap lookup to find the MethodDesc corresponding to the entry point. However, HashMap::LookupValue tries to RareDisablePreemptiveGC before doing anything, and so the thread gets stuck while still holding the reader lock we got before.

@SteveL-MSFT
Copy link
Contributor

@adityamandaleeka it would be great if we can get a fix in 2.0.x servicing

@adityamandaleeka
Copy link
Member

adityamandaleeka commented Oct 13, 2017

@SteveL-MSFT @daxian-dbw I agree that this issue should be fixed. We'll try to find a good solution.

Out of curiosity, though, have you tried crossgen-ing the assemblies with the FragileNonVersionable switch? If you pass that switch to crossgen, it will generate non-ReadyToRun images, which should work around this issue (at least, that's what I'd assume based on my analysis above). The images generated with the FragileNonVersionable switch will be brittle (not resilient to changes in the runtime/framework or other dependencies), but as far as I can tell you ship all the dependencies in the PowerShell packages anyway, so that might be okay for you.

@jkotas
Copy link
Member

jkotas commented Oct 13, 2017

have you tried crossgen-ing the assemblies with the FragileNonVersionable switch

These are quite a bit bigger, and we do not have any extensive testing for this config - it is pretty likely you will hit different bugs.

@adityamandaleeka
Copy link
Member

@SteveL-MSFT @daxian-dbw Update on this: I have a PR out with a fix in master. Once that's in, I'll go through the process to get it ported to the release/2.0.0 branch.

@daxian-dbw
Copy link
Contributor Author

@adityamandaleeka Thanks for the fix! When will we have a servicing package that includes the fix? Could you please point me to any docs about how .NET Core servicing works?

@adityamandaleeka
Copy link
Member

@daxian-dbw This will go into the next release after 2.0.3. There will be a pre-release build soon if you're interested in trying that out.

@jkotas jkotas closed this as completed Jan 17, 2018
@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the 2.1.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 25, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CrossGen/NGEN-coreclr area-ReadyToRun-coreclr blocking Marks issues that we want to fast track in order to unblock other important work
Projects
None yet
Development

No branches or pull requests

7 participants