Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dotnet build hangs while building libs.tests natively using cross built dotnet sdk #101121

Closed
alhad-deshpande opened this issue Apr 16, 2024 · 31 comments · Fixed by dotnet/roslyn#74994
Labels
arch-s390x Related to s390x architecture (unsupported) area-VM-meta-mono

Comments

@alhad-deshpande
Copy link
Contributor

alhad-deshpande commented Apr 16, 2024

Description

Dotnet build hangs while building libs.tests natively using cross built dotnet sdk.

Brief Analysis and probable root cause:

  1. The hang occurs while building project using msbuild runtime/src/libraries/System.Runtime.Loader/tests/ApplyUpdate/System.Reflection.Metadata.ApplyUpdate.Test.FirstCallAfterUpdate/System.Reflection.Metadata.ApplyUpdate.Test.FirstCallAfterUpdate.csproj
  2. Mono trace log for "Microsoft.DotNet.HotReload.Utils.Generator" shows there is FileNotFoundException.
  3. Traced the filename using strace command and found that the missing file is "/root/.nuget/packages/microsoft.dotnet.hotreload.utils.generator.buildtool/9.0.0-alpha.0.24168.2/tools/net9.0/BuildHost-netcore/Microsoft.Build.dll".
  4. If we check the directory contents for /root/.nuget/packages/microsoft.dotnet.hotreload.utils.generator.buildtool/9.0.0-alpha.0.24168.2/tools/net9.0/BuildHost-netcore/ then there is a file with name "Microsoft.Build.Locator.dll" but not "Microsoft.Build.dll".

Complete mono trace log for Microsoft.DotNet.HotReload.Utils.Generator namespace:

[root@devfirst-dotnet-rhel8-8-1 logs]# export MONO_ENV_OPTIONS="--trace=N:Microsoft.DotNet.HotReload.Utils.Generator"
[root@devfirst-dotnet-rhel8-8-1 logs]# /root/alhad/runtime/dotnet-sdk-ppc64le/.dotnet/dotnet /root/.nuget/packages/microsoft.dotnet.hotreload.utils.generator.buildtool/9.0.0-alpha.0.24168.2/build/../tools/net9.0/Microsoft.DotNet.HotReload.Utils.Generator.BuildTool.dll -msbuild:/root/alhad/runtime/src/libraries/System.Runtime.Loader/tests/ApplyUpdate/System.Reflection.Metadata.ApplyUpdate.Test.FirstCallAfterUpdate/System.Reflection.Metadata.ApplyUpdate.Test.FirstCallAfterUpdate.csproj -script:/root/alhad/runtime/src/libraries/System.Runtime.Loader/tests/ApplyUpdate/System.Reflection.Metadata.ApplyUpdate.Test.FirstCallAfterUpdate/deltascript.json -p:Configuration=Debug
[0x7fff8a9a5160: 0.00000 0] ENTER:c Microsoft.DotNet.HotReload.Utils.Generator.Config:Builder ()()
[0x7fff8a9a5160: 0.00019 0] LEAVE:c Microsoft.DotNet.HotReload.Utils.Generator.Config:Builder ()([.ConfigBuilder:0x7fff7c402bc8]
[0x7fff8a9a5160: 0.04096 0] ENTER:c Microsoft.DotNet.HotReload.Utils.Generator.MsbuildConfig:.ctor (Microsoft.DotNet.HotReload.Utils.Generator.Config/ConfigBuilder)(this:0x7fff7c405820[Microsoft.DotNet.HotReload.Utils.Generator.MsbuildConfig Microsoft.DotNet.HotReload.Utils.Generator.BuildTool.dll], [.ConfigBuilder:0x7fff7c402bc8])
[0x7fff8a9a5160: 0.04116 1] ENTER:c Microsoft.DotNet.HotReload.Utils.Generator.Config:.ctor (Microsoft.DotNet.HotReload.Utils.Generator.Config/ConfigBuilder)(this:0x7fff7c405820[Microsoft.DotNet.HotReload.Utils.Generator.MsbuildConfig Microsoft.DotNet.HotReload.Utils.Generator.BuildTool.dll], [.ConfigBuilder:0x7fff7c402bc8])
[0x7fff8a9a5160: 0.04145 1] LEAVE:c Microsoft.DotNet.HotReload.Utils.Generator.Config:.ctor (Microsoft.DotNet.HotReload.Utils.Generator.Config/ConfigBuilder)(
[0x7fff8a9a5160: 0.04146 0] LEAVE:c Microsoft.DotNet.HotReload.Utils.Generator.MsbuildConfig:.ctor (Microsoft.DotNet.HotReload.Utils.Generator.Config/ConfigBuilder)(
[0x7fff8a9a5160: 0.04437 0] ENTER:c Microsoft.DotNet.HotReload.Utils.Generator.Runner:Make (Microsoft.DotNet.HotReload.Utils.Generator.Config)([Microsoft.DotNet.HotReload.Utils.Generator.MsbuildConfig:0x7fff7c405820])
[0x7fff8a9a5160: 0.04444 1] ENTER:c Microsoft.DotNet.HotReload.Utils.Generator.Config:get_Live ()(this:0x7fff7c405820[Microsoft.DotNet.HotReload.Utils.Generator.MsbuildConfig Microsoft.DotNet.HotReload.Utils.Generator.BuildTool.dll])
[0x7fff8a9a5160: 0.04445 1] LEAVE:c Microsoft.DotNet.HotReload.Utils.Generator.Config:get_Live ()(result=0
[0x7fff8a9a5160: 0.04500 0] LEAVE:c Microsoft.DotNet.HotReload.Utils.Generator.Runner:Make (Microsoft.DotNet.HotReload.Utils.Generator.Config)([Microsoft.DotNet.HotReload.Utils.Generator.Runners.ScriptRunner:0x7fff7c405e60]
[0x7fff8a9a5160: 0.04516 0] ENTER:c Microsoft.DotNet.HotReload.Utils.Generator.Runner:Run (System.Threading.CancellationToken)(this:0x7fff7c405e60[Microsoft.DotNet.HotReload.Utils.Generator.Runners.ScriptRunner Microsoft.DotNet.HotReload.Utils.Generator.BuildTool.dll], [00,00,00,00,00,00,00,00,])
[0x7fff8a9a5160: 0.23871 0] LEAVE:c Microsoft.DotNet.HotReload.Utils.Generator.Runner:Run (System.Threading.CancellationToken)([.AsyncStateMachineBox`1:0x7fff7c424560]
[0x7fff738cf140: 0.26416 0] ENTER:c Microsoft.DotNet.HotReload.Utils.Generator.Runner:PrepareCapabilities ()(this:0x7fff7c405e60[Microsoft.DotNet.HotReload.Utils.Generator.Runners.ScriptRunner Microsoft.DotNet.HotReload.Utils.Generator.BuildTool.dll])
[0x7fff738cf140: 0.26422 1] ENTER:c Microsoft.DotNet.HotReload.Utils.Generator.Config:get_EditAndContinueCapabilities ()(this:0x7fff7c405820[Microsoft.DotNet.HotReload.Utils.Generator.MsbuildConfig Microsoft.DotNet.HotReload.Utils.Generator.BuildTool.dll])
[0x7fff738cf140: 0.26423 1] LEAVE:c Microsoft.DotNet.HotReload.Utils.Generator.Config:get_EditAndContinueCapabilities ()([System.String[]:0x7fff7c4025b8]
[0x7fff738cf140: 0.26446 1] ENTER:c Microsoft.DotNet.HotReload.Utils.Generator.EditAndContinueCapabilitiesParser:.cctor ()()
[0x7fff738cf140: 0.28896 1] LEAVE:c Microsoft.DotNet.HotReload.Utils.Generator.EditAndContinueCapabilitiesParser:.cctor ()(
[0x7fff738cf140: 0.28897 1] ENTER:c Microsoft.DotNet.HotReload.Utils.Generator.EditAndContinueCapabilitiesParser:Parse (System.Collections.Generic.IEnumerable`1<string>)(XX)
[0x7fff738cf140: 0.28912 2] ENTER:c Microsoft.DotNet.HotReload.Utils.Generator.EditAndContinueCapabilitiesParser:Tokenize (System.Collections.Generic.IEnumerable`1<string>)(XX)
[0x7fff738cf140: 0.28918 2] LEAVE:c Microsoft.DotNet.HotReload.Utils.Generator.EditAndContinueCapabilitiesParser:Tokenize (System.Collections.Generic.IEnumerable`1<string>)((unknown return type 15)
[0x7fff738cf140: 0.28940 2] ENTER:c Microsoft.DotNet.HotReload.Utils.Generator.EditAndContinueCapabilitiesParser:Parse (System.Collections.Generic.IEnumerable`1<Microsoft.DotNet.HotReload.Utils.Generator.EditAndContinueCapabilitiesParser/Token>)(XX)
[0x7fff738cf140: 0.29011 2] LEAVE:c Microsoft.DotNet.HotReload.Utils.Generator.EditAndContinueCapabilitiesParser:Parse (System.Collections.Generic.IEnumerable`1<Microsoft.DotNet.HotReload.Utils.Generator.EditAndContinueCapabilitiesParser/Token>)((unknown return type 15)
[0x7fff738cf140: 0.29012 1] LEAVE:c Microsoft.DotNet.HotReload.Utils.Generator.EditAndContinueCapabilitiesParser:Parse (System.Collections.Generic.IEnumerable`1<string>)((unknown return type 15)
[0x7fff738cf140: 0.29157 1] ENTER:c Microsoft.DotNet.HotReload.Utils.Generator.Runner:DefaultCapabilities ()(this:0x7fff7c405e60[Microsoft.DotNet.HotReload.Utils.Generator.Runners.ScriptRunner Microsoft.DotNet.HotReload.Utils.Generator.BuildTool.dll])
[0x7fff738cf140: 0.29158 1] LEAVE:c Microsoft.DotNet.HotReload.Utils.Generator.Runner:DefaultCapabilities ()(result=1023
[0x7fff738cf140: 0.29162 1] ENTER:c Microsoft.DotNet.HotReload.Utils.Generator.Config:get_NoWarnUnknownCapabilities ()(this:0x7fff7c405820[Microsoft.DotNet.HotReload.Utils.Generator.MsbuildConfig Microsoft.DotNet.HotReload.Utils.Generator.BuildTool.dll])
[0x7fff738cf140: 0.29163 1] LEAVE:c Microsoft.DotNet.HotReload.Utils.Generator.Config:get_NoWarnUnknownCapabilities ()(result=0
[0x7fff738cf140: 0.29180 0] LEAVE:c Microsoft.DotNet.HotReload.Utils.Generator.Runner:PrepareCapabilities ()(result=1023
[0x7fff738cf140: 0.29196 0] ENTER:c Microsoft.DotNet.HotReload.Utils.Generator.Runner:SetupBaseline (Microsoft.DotNet.HotReload.Utils.Generator.EnC.EditAndContinueCapabilities,System.Threading.CancellationToken)(this:0x7fff7c405e60[Microsoft.DotNet.HotReload.Utils.Generator.Runners.ScriptRunner Microsoft.DotNet.HotReload.Utils.Generator.BuildTool.dll], 1023, [00,00,00,00,00,00,00,00,])
[0x7fff738cf140: 0.29293 1] ENTER:c Microsoft.DotNet.HotReload.Utils.Generator.Runner:InitMSBuild ()()
[0x7fff738cf140: 1.12321 1] LEAVE:c Microsoft.DotNet.HotReload.Utils.Generator.Runner:InitMSBuild ()(
[0x7fff738cf140: 1.12342 1] ENTER:c Microsoft.DotNet.HotReload.Utils.Generator.BaselineProject:Make (Microsoft.DotNet.HotReload.Utils.Generator.Config,Microsoft.DotNet.HotReload.Utils.Generator.EnC.EditAndContinueCapabilities,System.Threading.CancellationToken)([Microsoft.DotNet.HotReload.Utils.Generator.MsbuildConfig:0x7fff7c405820], 1023, [00,00,00,00,00,00,00,00,])
[0x7fff738cf140: 1.12455 2] ENTER:c Microsoft.DotNet.HotReload.Utils.Generator.BaselineProject:PrepareMSBuildProject (Microsoft.DotNet.HotReload.Utils.Generator.Config,Microsoft.DotNet.HotReload.Utils.Generator.EnC.EditAndContinueCapabilities,System.Threading.CancellationToken)([Microsoft.DotNet.HotReload.Utils.Generator.MsbuildConfig:0x7fff7c405820], 1023, [00,00,00,00,00,00,00,00,])
[0x7fff738cf140:] EXCEPTION handling: System.IO.FileNotFoundException:
[0x7fff738cf140: 2.18797 2] LEAVE:c Microsoft.DotNet.HotReload.Utils.Generator.BaselineProject:PrepareMSBuildProject (Microsoft.DotNet.HotReload.Utils.Generator.Config,Microsoft.DotNet.HotReload.Utils.Generator.EnC.EditAndContinueCapabilities,System.Threading.CancellationToken)((unknown return type 15)
[0x7fff738cf140: 2.18885 1] LEAVE:c Microsoft.DotNet.HotReload.Utils.Generator.BaselineProject:Make (Microsoft.DotNet.HotReload.Utils.Generator.Config,Microsoft.DotNet.HotReload.Utils.Generator.EnC.EditAndContinueCapabilities,System.Threading.CancellationToken)((unknown return type 15)
[0x7fff738cf140: 2.18978 0] LEAVE:c Microsoft.DotNet.HotReload.Utils.Generator.Runner:SetupBaseline (Microsoft.DotNet.HotReload.Utils.Generator.EnC.EditAndContinueCapabilities,System.Threading.CancellationToken)((unknown return type 15)
[0x7fff738cf140:] EXCEPTION handling: Newtonsoft.Json.JsonReaderException: Unexpected character encountered while parsing value: C. Path '', line 0, position 0.
[0x7fff738cf140:] EXCEPTION handling: Newtonsoft.Json.JsonReaderException: Unexpected character encountered while parsing value: C. Path '', line 0, position 0.

Reproduction Steps

  1. Cross build dotnet sdk on x86_64
  2. Native build runtime and libs.tests on any machine ppc64le/s390x or x86_64

Expected behavior

The dotnet native build should not hang

Actual behavior

The dotnet native build hangs.

Regression?

No response

Known Workarounds

No response

Configuration

No response

Other information

No response

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Apr 16, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Apr 16, 2024
@uweigand
Copy link
Contributor

To add some additional context, just before the hang we see this error:

Could not load signature of Microsoft.CodeAnalysis.CSharp.CSharpProjectFileLoader:CreateProjectFile due to: Could not load file or assembly 'Microsoft.Build, Version=15.1.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a' or one of its dependencies.

@jkotas jkotas added arch-s390x Related to s390x architecture (unsupported) area-VM-meta-mono and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Apr 16, 2024
@tmds
Copy link
Member

tmds commented Apr 18, 2024

The issue isn't specific to ppc64le / s390 / cross-compilation. It reproduces also with building runtime tests with an x64 mono runtime on x64.

cc @lambdageek

@uweigand
Copy link
Contributor

uweigand commented May 2, 2024

We've done a bit more investigation on where this error is hit. We're running

dotnet
/root/.nuget/packages/microsoft.dotnet.hotreload.utils.generator.buildtool/9.0.0-alpha.0.24168.2/build/../tools/net9.0/Microsoft.DotNet.HotReload.Utils.Generator.BuildTool.dll
-msbuild:/root/alhad/runtime/src/libraries/System.Runtime.Loader/tests/ApplyUpdate/System.Reflection.Metadata.ApplyUpdate.Test.FirstCallAfterUpdate/System.Reflection.Metadata.ApplyUpdate.Test.FirstCallAfterUpdate.csproj
-script:/root/alhad/runtime/src/libraries/System.Runtime.Loader/tests/ApplyUpdate/System.Reflection.Metadata.ApplyUpdate.Test.FirstCallAfterUpdate/deltascript.json
-p:Configuration=Debug

which ends up calling the PrepareMSBuildProject routine here https://github.com/dotnet/hotreload-utils/blob/c667b519c4bcdf121106fac021a53d98120e825a/src/Microsoft.DotNet.HotReload.Utils.Generator/BaselineProject.cs#L23
which calls msw.OpenProjectAsync.

This creates a new "BuildHost" process running

dotnet --roll-forward LatestMajor
/root/.nuget/packages/microsoft.dotnet.hotreload.utils.generator.buildtool/9.0.0-alpha.0.24168.2/tools/net9.0/BuildHost-netcore/Microsoft.CodeAnalysis.Workspaces.MSBuild.BuildHost.dll
--property Configuration=Debug

and uses the Microsoft.CodeAnalysis.MSBuild.Rpc.RpcServer:ProcessRequestAsync method to invoke the Microsoft.CodeAnalysis.Workspaces.MSBuild.BuildHost.BuildHost:LoadProjectFileAsync routine there, which tries to create an instance of Microsoft.CodeAnalysis.CSharp.CSharpProjectFileLoader:CreateProjectFile here: https://github.com/dotnet/roslyn/blob/fa7ece1ca25ef529db616ed6c839af9f027fa216/src/Workspaces/Core/MSBuild.BuildHost/BuildHost.cs#L180

This ends up in the Mono JIT attempting to load and initialize that type, which fails with:

Could not load signature of Microsoft.CodeAnalysis.CSharp.CSharpProjectFileLoader:CreateProjectFile due to:
Could not load file or assembly 'Microsoft.Build, Version=15.1.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a' or one of its dependencies.

causing

An exception of type System.TypeLoadException was thrown:
VTable setup of type Microsoft.CodeAnalysis.CSharp.CSharpProjectFileLoader failed

The Mono JIT previously had attempted to search for Microsoft.Build unsuccessfully in these paths:

/root/.nuget/packages/microsoft.dotnet.hotreload.utils.generator.buildtool/9.0.0-alpha.0.24168.2/tools/net9.0/BuildHost-netcore/Microsoft.Build.dll
/root/.nuget/packages/microsoft.dotnet.hotreload.utils.generator.buildtool/9.0.0-alpha.0.24168.2/tools/net9.0/BuildHost-netcore/Microsoft.Build.exe
/root/.nuget/packages/microsoft.dotnet.hotreload.utils.generator.buildtool/9.0.0-alpha.0.24168.2/tools/net9.0/BuildHost-netcore/Microsoft.Build/Microsoft.Build.dll
/root/.nuget/packages/microsoft.dotnet.hotreload.utils.generator.buildtool/9.0.0-alpha.0.24168.2/tools/net9.0/BuildHost-netcore/Microsoft.Build/Microsoft.Build.exe

Now, there's still a number of questions:

First of all, why is Microsoft.Build.dll not packaged with the microsoft.dotnet.hotreload.utils.generator.buildtool nupkg, even though a number of other Microsoft.Build assemblies are? If I'm understanding this correctly, that could be an effect of the ExcludeAssets="runtime" attribute here: https://github.com/dotnet/hotreload-utils/blob/c667b519c4bcdf121106fac021a53d98120e825a/src/Microsoft.DotNet.HotReload.Utils.Generator/Microsoft.DotNet.HotReload.Utils.Generator.csproj#L12 I'm not sure why this was added?

Second, if Microsoft.Build.dll is not found there, should it be taken from the host .NET SDK? The file is available there, but the Mono JIT doesn't even appear to look there. Not sure if this is as intended, or if there are still some Mono loader search order issues.

Finally, why does this error only show up in Mono? Either it is because of the above-mentioned search order, or else it might be another instance of the problem where Mono is more eager in requiring to load types during JIT that may not actually be referenced later (where CoreCLR would defer the type loading to a later stage, which might not ever happen in this case).

Any suggestion on how to proceed here would be appreciated!

CC @giritrivedi @directhex @akoeplinger @lambdageek

@janani66
Copy link

@YuliiaKovalova -- would you be able to take a look at this issue and advise?

@YuliiaKovalova
Copy link
Member

cc: @rainersigwald

@rainersigwald
Copy link
Member

First of all, why is Microsoft.Build.dll not packaged with the microsoft.dotnet.hotreload.utils.generator.buildtool nupkg

MSBuild assemblies should basically always come from the .NET SDK, not be repackaged by tools--MSBuild logic tends to make a bunch of assumptions about where .props/targets/etc files are relative to the loaded MSBuild DLLs, and it's also best to load the exact SDK that would be involved in dotnet build.

Second, if Microsoft.Build.dll is not found there, should it be taken from the host .NET SDK?

So yes. That's what MSBuildLocator does: it adds some AssemblyLoad event hooks to help find the right files in the SDK.

Finally, why does this error only show up in Mono? Either it is because of the above-mentioned search order, or else it might be another instance of the problem where Mono is more eager in requiring to load types during JIT that may not actually be referenced later (where CoreCLR would defer the type loading to a later stage, which might not ever happen in this case).

The latter behavior sounds very interesting. On .NET Framework and coreclr, the load is triggered by JIT compilation of the first method containing a type from Microsoft.Build.dll. Locator thus requires that you call Register() from a method that is JITed before calling the first such method.

If Mono is more aggressive about trying to load the type (say when the class containing such a method is first seen, rather than the method itself), that might explain the problem

@uweigand
Copy link
Contributor

Second, if Microsoft.Build.dll is not found there, should it be taken from the host .NET SDK?

So yes. That's what MSBuildLocator does: it adds some AssemblyLoad event hooks to help find the right files in the SDK.

Finally, why does this error only show up in Mono? Either it is because of the above-mentioned search order, or else it might be another instance of the problem where Mono is more eager in requiring to load types during JIT that may not actually be referenced later (where CoreCLR would defer the type loading to a later stage, which might not ever happen in this case).

The latter behavior sounds very interesting. On .NET Framework and coreclr, the load is triggered by JIT compilation of the first method containing a type from Microsoft.Build.dll. Locator thus requires that you call Register() from a method that is JITed before calling the first such method.

If Mono is more aggressive about trying to load the type (say when the class containing such a method is first seen, rather than the method itself), that might explain the problem

Thanks for the details! This does indeed look very similar to problems we've had in the past.

Here's a PR from a while back describing one of these problems, where some assembly was also supposed to the found via an event hook, but when using Mono the JIT already was trying (and failing) to load a type defined in that assembly before that hook was even installed: #60550 (comment)

In this case, the JIT also aborted when building a routine that merely refers to the class that contains an element of that type.

@uweigand
Copy link
Contributor

Do you think there is a way to rewrite the msbuild code so it avoids accessing even the containing class type before the handler is set up? Or can you suggest any other way to resolve this issue?

@rainersigwald
Copy link
Member

No, I don't think there's a way to rewrite MSBuild or MSBuildLocator here, unfortunately.

What we need is:

  • MSBuild can remain ignorant of Locator and "just uses types naturally" (the API has so many entry points we can't possibly hook load stuff around all of them).
  • User code can call Locator, then use MSBuild types without too much ceremony.

It might be possible to change microsoft.dotnet.hotreload.utils.generator.buildtool to call Locator "even earlier" (or rather call MSBuild through more class indirection) so that the call to Register() happens before the runtime tries to load MSBuild types? If there's a good pattern for that we could certainly document it for Locator users.

@uweigand
Copy link
Contributor

Hmm, looks like there is another twist - we're dealing with two distinct dotnet processes here. The main buildtool process runs:

    public async Task<BaselineArtifacts> SetupBaseline (EnC.EditAndContinueCapabilities capabilities, CancellationToken ct = default) {
        InitMSBuild();
        BaselineProject baselineProject = await Microsoft.DotNet.HotReload.Utils.Generator.BaselineProject.Make (config, capabilities, ct);

where the first line calls

    private static void InitMSBuild ()
    {
        Microsoft.Build.Locator.MSBuildLocator.RegisterDefaults();
    }

and the second line calls

    static async Task<(EnC.ChangeMakerService, Solution, ProjectId)> PrepareMSBuildProject (Config config, EnC.EditAndContinueCapabilities capabilities, CancellationToken ct = default)
    {
                Microsoft.CodeAnalysis.MSBuild.MSBuildWorkspace msw;
[...]
                msw = Microsoft.CodeAnalysis.MSBuild.MSBuildWorkspace.Create(props);
[...]
                var project = await msw.OpenProjectAsync (config.ProjectPath, logger, null, ct);

and that last line starts a new process (as described in my comment above).

So we're calling the MSBuildLocator in the old process, but it is actually the new process where the JIT aborts due to the missing assembly. How is this supposed to work? A load hook set in the old process wouldn't be available in the new process, right? I see the MSBuildLocator also sets some environment variables, but I'm not sure I understand properly how these are intended to be used by child processes - should the child install its own load hook somewhere?

@rainersigwald
Copy link
Member

ah, that is very interesting-- @jasonmalinowski does this ring any bells for you?

@jasonmalinowski
Copy link
Member

With dotnet/roslyn#70469, MSBuildWorkspace does all its MSBuild operations in the separate process; it'll no longer load (or even use) MSBuild in-process. So based upon @uweigand's comment, unless something else is using MSBuild in the main process, the RegisterDefaults() line could just be removed (and potentially packages removed too.) The other process itself uses MSBuildLocator to locate MSBuild and then use it.

As far as ringing a bell, I had a similar gotcha at one point regarding mono with a scary-sounding comment here:

https://github.com/dotnet/roslyn/blob/2077862a15f4b955a4c5d139b82cb47087919e55/src/Workspaces/Core/MSBuild.BuildHost/BuildHost.cs#L145-L146

Fundamentally Mono's behavior for when the JIT needs a type to be present is different than our other runtimes, and we've been bitten before specifically around in this code since we need to ensure its registered early enough before we use types, but for Mono that's even a bit earlier than otherwise. So if there's a change that needs to be made to Roslyn I wouldn't be surprised (although I'm surprised we haven't heard another complaint about this.)

There is also likely a second bug here too, somewhere: even if that process is crashing catastrophically, MSBuildWorkspace should still be throwing exceptions to let the caller know something went bad. If that's not happening, that's also bad. If that is happening but the caller isn't itself ready for something like that, then the caller should also be fixed.

@uweigand
Copy link
Contributor

The other process itself uses MSBuildLocator to locate MSBuild and then use it.

Ah, right! That does make the problem look indeed like just another instance of #60550 (comment) ... looking at BuildHost:LoadProjectFileAsync:

    public async Task<int> LoadProjectFileAsync(string projectFilePath, string languageName, CancellationToken cancellationToken)
    {
        EnsureMSBuildLoaded(projectFilePath);
        CreateBuildManager();

        ProjectFileLoader projectLoader = languageName switch
        {
            LanguageNames.CSharp => new CSharp.CSharpProjectFileLoader(),
            LanguageNames.VisualBasic => new VisualBasic.VisualBasicProjectFileLoader(),
            _ => throw ExceptionUtilities.UnexpectedValue(languageName)
        };

        _logger.LogInformation($"Loading {projectFilePath}");
        var projectFile = await projectLoader.LoadProjectFileAsync(projectFilePath, _buildManager, cancellationToken).ConfigureAwait(false);
        return _server.AddTarget(projectFile);
    }

we see that the routine calls EnsureMSBuildLoaded to set up the MSBuildLocator and then creates a new CSharp.CSharpProjectFileLoader object (which requires the build locator to find the assembly where that type is defined).

With Mono, that type seems to be referenced already when the LoadProjectFileAsync routine is first compiled by the JIT - at this point, the EnsureMSBuildLoaded routine did not have a chance to run yet.

Maybe one workaround might be to move the switch statement setting up the appropriate loader per language into some subroutine? Or else move the EnsureMSBuildLoaded call earlier, i.e. into the caller(s) of LoadProjectFileAsync?

@jasonmalinowski
Copy link
Member

Maybe one workaround might be to move the switch statement setting up the appropriate loader per language into some subroutine?

Yep, that's what we need to do. That same pattern is already done other places in the code:

https://github.com/dotnet/roslyn/blob/bb5ea47485fc3fd044993a5828665d6df8459a10/src/Workspaces/Core/MSBuild.BuildHost/BuildHost.cs#L136-L143

Or else move the EnsureMSBuildLoaded call earlier, i.e. into the caller(s) of LoadProjectFileAsync?

The caller in this case is actually on the other side of the RPC channel, so that's not so easy.

@uweigand This change isn't something I'll be able to get to as I'm currently looking at a few other things -- is this a change you're comfortable making (and testing) or do you need somebody else on our side to do it?

uweigand added a commit to uweigand/roslyn that referenced this issue Jul 15, 2024
The LoadProjectFileAsync routine calls EnsureMSBuildLoaded to make
sure the Microsoft.Build.dll is accessible, but requires types from
that assembly within the routine itself.  This may not always work
with Mono, as the JIT may look up the types during compilation.

Fixed by splitting out a LoadProjectFileAsyncCore routine, similarly
to what is already done for GetProjectsInSolution.

Fixes dotnet/runtime#101121.
@uweigand
Copy link
Contributor

Maybe one workaround might be to move the switch statement setting up the appropriate loader per language into some subroutine?

Yep, that's what we need to do. That same pattern is already done other places in the code:

https://github.com/dotnet/roslyn/blob/bb5ea47485fc3fd044993a5828665d6df8459a10/src/Workspaces/Core/MSBuild.BuildHost/BuildHost.cs#L136-L143

Ah, right! This is exactly what we need to do here as well.

Or else move the EnsureMSBuildLoaded call earlier, i.e. into the caller(s) of LoadProjectFileAsync?

The caller in this case is actually on the other side of the RPC channel, so that's not so easy.

@uweigand This change isn't something I'll be able to get to as I'm currently looking at a few other things -- is this a change you're comfortable making (and testing) or do you need somebody else on our side to do it?

I've now opened dotnet/roslyn#74392, which fixes the problem for me.

However, in addition to getting this upstream I guess we'll need to make sure this actually gets used by the runtime tests. As far as I can see, this would mean creating a new Microsoft.CodeAnalysis.Workspaces.MSBuild package version (maybe backporting the fix to the 4.11 branch?), pulling that new package into microsoft.dotnet.hotreload.utils.generator.buildtool (which seems to be an internal package?), creating a new version of that package and pulling that into the runtime. Am I missing anything here?

uweigand added a commit to uweigand/roslyn that referenced this issue Jul 15, 2024
The LoadProjectFileAsync routine calls EnsureMSBuildLoaded to make
sure the Microsoft.Build.dll is accessible, but requires types from
that assembly within the routine itself.  This may not always work
with Mono, as the JIT may look up the types during compilation.

Fixed by splitting out a LoadProjectFileCoreAsync routine, similarly
to what is already done for GetProjectsInSolution.

Fixes dotnet/runtime#101121.
@jasonmalinowski
Copy link
Member

@uweigand: so once that's merged we'll have new builds. @tmat are you familiar with the Hot Reload utilities here to understand the code flow at this point?

@tmat
Copy link
Member

tmat commented Jul 16, 2024

I am not. @lambdageek might be.

uweigand added a commit to uweigand/roslyn that referenced this issue Jul 17, 2024
The LoadProjectFileAsync routine calls EnsureMSBuildLoaded to make
sure the Microsoft.Build.dll is accessible, but requires types from
that assembly within the routine itself.  This may not always work
with Mono, as the JIT may look up the types during compilation.

Fixed by splitting out a LoadProjectFileCoreAsync routine, similarly
to what is already done for GetProjectsInSolution.

Fixes dotnet/runtime#101121.
uweigand added a commit to uweigand/roslyn that referenced this issue Jul 17, 2024
The LoadProjectFileAsync routine calls EnsureMSBuildLoaded to make
sure the Microsoft.Build.dll is accessible, but requires types from
that assembly within the routine itself.  This may not always work
with Mono, as the JIT may look up the types during compilation.

Fixed by splitting out a LoadProjectFileCoreAsync routine, similarly
to what is already done for GetProjectsInSolution.

Fixes dotnet/runtime#101121.
jaredpar pushed a commit to jaredpar/roslyn that referenced this issue Sep 4, 2024
The LoadProjectFileAsync routine calls EnsureMSBuildLoaded to make
sure the Microsoft.Build.dll is accessible, but requires types from
that assembly within the routine itself.  This may not always work
with Mono, as the JIT may look up the types during compilation.

Fixed by splitting out a LoadProjectFileCoreAsync routine, similarly
to what is already done for GetProjectsInSolution.

Fixes dotnet/runtime#101121.
@dotnet-policy-service dotnet-policy-service bot removed the untriaged New issue has not been triaged by the area owner label Sep 5, 2024
@uweigand
Copy link
Contributor

uweigand commented Sep 5, 2024

While the roslyn patch has now been merged (thanks!), this issue should not be closed quite yet. The new assembly still needs to be picked up by microsoft.dotnet.hotreload.utils.generator.buildtool and then the runtime, see above.

Can someone with the necessary authority please re-open this issue?

@jasonmalinowski
Copy link
Member

@tmat Do you know if microsoft.dotnet.hotreload.utils.generator.buildtool will pick up the changes automatically or does something further need to be done?

@uweigand Practically if there's no further work to do in this repo, it'd be best to keep this bug closed and have another bug tracking picking up the change if needed. But do we have explicit work to do somewhere or will this flow with the usual darc flows?

@tmat
Copy link
Member

tmat commented Sep 5, 2024

I have no idea what this tool is/does.

@uweigand
Copy link
Contributor

uweigand commented Sep 6, 2024

So here's what I can see:

So it looks like the main problem is that the fix was merged to roslyn mainline, but not the 17.11 branch (and it doesn't look like there's any automatic merging here). This means in order to fix this issue for the runtime tests, we'd have to either backport the roslyn fix to 17.11, or else update hotreload-utils to pull in a more recent roslyn version (not sure what other effects this might have).

@jasonmalinowski
Copy link
Member

@akoeplinger or @lambdageek can you help us with the flow question here? We've made a fix in Roslyn but not sure what branching is all happening here.

@lambdageek

This comment was marked as outdated.

@lambdageek
Copy link
Member

Actually @jasonmalinowski does the roslyn->hotreload-utils subscription look right? do we need hotreload-utils main to pick up the VS 17.12 channel instead? Is that where main work is happening now?

@jasonmalinowski
Copy link
Member

@lambdageek I have no idea how any of this works. @dotnet/roslyn-infrastructure or @jaredpar maybe you can help?

@rainersigwald
Copy link
Member

Roslyn main should be the VS 17.12 channel now.

@thaystg
Copy link
Member

thaystg commented Sep 6, 2024

We have just updated the hotreload-utils subscriptions:

https://github.com/dotnet/roslyn (VS 17.12) ==> 'https://github.com/dotnet/hotreload-utils' ('main')
  - Id: 382b0e78-e254-45c6-24d7-08d9158524d7
  - Update Frequency: EveryWeek
  - Enabled: True
  - Batchable: True
  - PR Failure Notification tags:
  - Source-enabled: False
  - Merge Policies:
    Standard
  - Last Build: 20240906.1 (1ce264cab372119376a734a247e9f60fe898877a)
https://github.com/dotnet/roslyn (VS 17.3) ==> 'https://github.com/dotnet/hotreload-utils' ('release/6.0')
  - Id: 502362e8-82de-4374-ae45-08db360867bf
  - Update Frequency: EveryWeek
  - Enabled: True
  - Batchable: True
  - PR Failure Notification tags:
  - Source-enabled: False
  - Merge Policies: []
  - Last Build: 20221026.13 (41a5af9d2c459a06c0795bf21a1c046200f375bf)
https://github.com/dotnet/roslyn (VS 17.8) ==> 'https://github.com/dotnet/hotreload-utils' ('release/8.0')
  - Id: cbf2cd55-5246-40b3-c9a2-08dbcf1dbc26
  - Update Frequency: EveryWeek
  - Enabled: True
  - Batchable: False
  - PR Failure Notification tags:
  - Source-enabled: False
  - Merge Policies:
    Standard
  - Last Build: 20240823.6 (d8282e36f9e389ee9eaad96b5aba66b83b9a37fd)
https://github.com/dotnet/roslyn (VS 17.12) ==> 'https://github.com/dotnet/hotreload-utils' ('release/9.0')
  - Id: d81ef602-8555-458e-ac21-8113ab0aea0a
  - Update Frequency: EveryWeek
  - Enabled: True
  - Batchable: False
  - PR Failure Notification tags:
  - Source-enabled: False
  - Merge Policies:
    Standard
  - Last Build: 20240906.1 (1ce264cab372119376a734a247e9f60fe898877a)

and

https://github.com/dotnet/hotreload-utils (.NET 10) ==> 'https://github.com/dotnet/runtime' ('main')
  - Id: bfe6dacf-8231-4ea1-e2fe-08d962847885
  - Update Frequency: EveryDay
  - Enabled: True
  - Batchable: True
  - PR Failure Notification tags:
  - Source-enabled: False
  - Merge Policies:
    AllChecksSuccessful
      ignoreChecks =
                       [
                         "WIP",
                         "license/cla",
                         "Build Analysis",
                         "runtime-coreclr superpmi-diffs",
                         "runtime-coreclr superpmi-replay",
                         "runtime-coreclr jitstress-isas-avx512",
                         "runtime-libraries enterprise-linux",
                         "runtime-wasm-perf"
                       ]
    NoRequestedChanges
  - Last Build: 20240903.1 (22ca60ef848b380d6665318e02694f073a534133)
https://github.com/dotnet/hotreload-utils (.NET 6) ==> 'https://github.com/dotnet/runtime' ('release/6.0')
  - Id: 206e9efc-bf5a-4d71-5108-08db3604d504
  - Update Frequency: None
  - Enabled: True
  - Batchable: False
  - PR Failure Notification tags:
  - Source-enabled: False
  - Merge Policies: []
https://github.com/dotnet/hotreload-utils (.NET 6) ==> 'https://github.com/dotnet/runtime' ('release/6.0-staging')
  - Id: dd5d0542-c4f7-4567-5ca9-08db3604d60f
  - Update Frequency: EveryBuild
  - Enabled: True
  - Batchable: False
  - PR Failure Notification tags:
  - Source-enabled: False
  - Merge Policies: []
  - Last Build: 20240802.2 (036e630c228a06e61db385ecbe2b96ad995b0f61)
https://github.com/dotnet/hotreload-utils (.NET 8) ==> 'https://github.com/dotnet/runtime' ('release/8.0')
  - Id: 479b3557-3b19-4693-a785-08db9e4038dc
  - Update Frequency: None
  - Enabled: True
  - Batchable: False
  - PR Failure Notification tags:
  - Source-enabled: False
  - Merge Policies: []
  - Last Build: 20231023.2 (8e108a21d0c7d8fa2050a1bdd4d4ba50d2b8df13)
https://github.com/dotnet/hotreload-utils (.NET 8) ==> 'https://github.com/dotnet/runtime' ('release/8.0-staging')
  - Id: c0fef358-3848-4a30-a438-08dbcfd61a5a
  - Update Frequency: EveryBuild
  - Enabled: True
  - Batchable: False
  - PR Failure Notification tags:
  - Source-enabled: False
  - Merge Policies: []
  - Last Build: 20240903.2 (5339e12def2a3605d069c429840089ae27838728)
https://github.com/dotnet/hotreload-utils (.NET 9) ==> 'https://github.com/dotnet/runtime' ('release/9.0')
  - Id: 6181ce50-ca7a-4061-0a4d-08dcbc302537
  - Update Frequency: EveryDay
  - Enabled: True
  - Batchable: False
  - PR Failure Notification tags:
  - Source-enabled: False
  - Merge Policies: []
  - Last Build: 20240903.1 (22ca60ef848b380d6665318e02694f073a534133)
https://github.com/dotnet/hotreload-utils (.NET 9) ==> 'https://github.com/dotnet/runtime' ('release/9.0-staging')
  - Id: b458125c-6cf0-41ea-981d-2709b660d5c5
  - Update Frequency: EveryBuild
  - Enabled: True
  - Batchable: False
  - PR Failure Notification tags:
  - Source-enabled: False
  - Merge Policies: []

@lambdageek
Copy link
Member

lambdageek commented Sep 6, 2024

@uweigand sounds like we can also remove the dependency on MSBuildLocator from hotreload-utils? As far as I know we don't use it for anything besides just setting up the workspace for Roslyn, which now shouldn't be necessary

Looks like we already got rid of it earlier dotnet/hotreload-utils#379

@lambdageek
Copy link
Member

lambdageek commented Sep 6, 2024

I have no idea what this tool is/does.

It's used for running hot reload unit tests in the dotnet/runtime repo against mono and coreclr via the ApplyUpdate API. The testsuite in dotnet/runtime is primarily a smoketest to notice regression faster (more extensive testing goes through the debugger).

@uweigand
Copy link
Contributor

uweigand commented Sep 9, 2024

Thanks! I now see that hotreload-utils (both main and release/9.0) pulls in a current roslyn package including the fix. However,
the runtime (either main or .NET 9) is still pulling in an old hotreload-utils package from Aug 19. Are there any further manual actions necessary, or will this get updated automatically soon?

@lambdageek
Copy link
Member

Are there any further manual actions necessary, or will this get updated automatically soon?

The dotnet-maestro bot will make the PRs. There should be no more manual interventions (other than waiting for green CI):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-s390x Related to s390x architecture (unsupported) area-VM-meta-mono
Projects
None yet