Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wasm] Migrate from catch_all to catch in the jiterpreter and destroy exceptions #91364

Merged
merged 4 commits into from
Oct 5, 2023

Conversation

kg
Copy link
Contributor

@kg kg commented Aug 30, 2023

This PR migrates from using the catch_all opcode in the jiterpreter to using catch with a specific tag, which partially fixes #90692.

Doing this requires build changes in order to ensure that the c++ exception handling infrastructure is preserved by the linker and exported so it becomes visible to the jiterpreter. In addition to the transition from catch_all to catch we now engage the c++ exception handling infrastructure to free thrown exceptions, since previously they would have leaked.

This PR also removes the jiterpreter's do_jit_call/jit_call_cb dispatcher - we no longer need it for runtime builds using Wasm EH (the code clang generates in that mode has equivalent or better performance).

The rest of the fix for 90692 is being handled in another PR.

@kg kg added arch-wasm WebAssembly architecture area-Codegen-Interpreter-mono labels Aug 30, 2023
@ghost
Copy link

ghost commented Aug 30, 2023

Tagging subscribers to this area: @BrzVlad, @kotlarmilos
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR migrates from using the 'catch_all' opcode in the jiterpreter to using 'catch' with a specific tag, which should fix #90692.

Doing this requires build changes in order to ensure that the c++ exception handling infrastructure is preserved by the linker and exported so it becomes visible to the jiterpreter. I'm still not sure what exact set of changes is necessary because it turns out the way emscripten parses exception-related options is Strange(tm) and we need to have the exact right settings at both compile time and link time for it to work.

Not ready for merge yet, but I want to see what breaks on CI.

Author: kg
Assignees: -
Labels:

arch-wasm, area-Codegen-Interpreter-mono

Milestone: -

@kg
Copy link
Contributor Author

kg commented Sep 1, 2023

Requesting review because the functionality parts of this are basically done, and I reduced the build changes to the ones that seem necessary to me. I'm still not sure how we should actually do these build configuration changes though, so I hope someone can help figure that out. The situation as I understand it:

  • All of emscripten's exception-related flags interact with each other in strange ways, so our previous configuration was no good, since we could end up passing 3 different exception flags to a single compiler or linker invocation
  • Exception configuration matters both at build time and link time and we need to match them so that exception handling machinery flows through correctly. Before this we were getting linked modules that contained a mix of both JS and native WASM exception handling, or the linker was failing to find the exception handling machinery even though we compiled with exceptions on.
  • I'm not sure there is a way for the jiterpreter to be fully compatible with emscripten's JS exception handling; if there is, this PR doesn't solve that. I only was able to verify that with these changes we specifically trap C++ exceptions and call the c++ catch machinery that appears to free the heap allocations backing them. I still need to test [Mono WASM + AOT] Environment.Exit does not work together with exception handling #90692 in various configurations to make sure it's fixed in as many of them as possible.

@pavelsavara
Copy link
Member

The linked issue was AOT issue, does this address non-AOT too ?
Does this change overall size or performance of the application ?
Does this change re-link time with workload ?

@lambdageek
Copy link
Member

lambdageek commented Sep 1, 2023

  • I'm not sure there is a way for the jiterpreter to be fully compatible with emscripten's JS exception handling; if there is, this PR doesn't solve that.

It looks like the way Emscripten implements C++ exceptions when wasm exceptions are disabled is by doing a rewrite pass in LLVM that throws a JS exception on a throw and that wraps all functions with catch clauses (well... landingpads... but source-level it's catch clauses) in a JS try/catch. details here:
https://llvm.org/doxygen/WebAssemblyLowerEmscriptenEHSjLj_8cpp.html#details

we might be able to emulate this rewrite in the jiterpreter. and possibly mess with the functions in library_exceptions.js to add some mono specific behavior on catches. but i'm not sure how complex that will be

@kg
Copy link
Contributor Author

kg commented Sep 1, 2023

The linked issue was AOT issue, does this address non-AOT too ? Does this change overall size or performance of the application ? Does this change re-link time with workload ?

All the code in question only runs for AOTed applications as far as I know. I don't know what the impact on link time or generated code are yet.

@kg
Copy link
Contributor Author

kg commented Sep 1, 2023

  • I'm not sure there is a way for the jiterpreter to be fully compatible with emscripten's JS exception handling; if there is, this PR doesn't solve that.

It looks like the way Emscripten implements C++ exceptions when wasm exceptions are disabled is by doing a rewrite pass in LLVM that throws a JS exception on a throw and that wraps all functions with catch clauses (well... landingpads... but source-level it's catch clauses) in a JS try/catch. details here: https://llvm.org/doxygen/WebAssemblyLowerEmscriptenEHSjLj_8cpp.html#details

we might be able to emulate this rewrite in the jiterpreter. and possibly mess with the functions in library_exceptions.js to add some mono specific behavior on catches. but i'm not sure how complex that will be

It looks like this version of __cxa_begin_catch is written in JS, so it's possible we can do something similar to what I'm already doing and it will still work... the trick is figuring out what shape emscripten's JS exceptions have. There's probably a tag property I can check for. The jiterpreter does have JS fallbacks already so it's a matter of customizing them I think.

@kg
Copy link
Contributor Author

kg commented Sep 1, 2023

Part way through getting this all to work with wasm exception handling disabled.

@kg kg added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Sep 1, 2023
@kg kg removed the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Sep 5, 2023
@kg
Copy link
Contributor Author

kg commented Sep 5, 2023

Should work now in both native WASM EH mode and JavaScript EH

Copy link
Member

@lambdageek lambdageek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how we can look at WasmEnableExceptionHandling in mono.proj and wasm.proj

Comment on lines 411 to 412
<_AppropriateWasmExceptionFlags Condition="'$(WasmEnableExceptionHandling)' == 'false'">-fexceptions</_AppropriateWasmExceptionFlags>
<_AppropriateWasmExceptionFlags Condition="'$(WasmEnableExceptionHandling)' != 'false'">-fwasm-exceptions</_AppropriateWasmExceptionFlags>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work? Users can set WasmEnableExceptionHandling to either true or false in their own projects - which gets picked up by WasmApp.targets. But it can't affect how the runtime is compiled - that happens once in CI and then gets packed up into a nuget.

So we build the runtime with whatever WasmEnableExceptionHandling value happens to be in effect on CI (I think right now the default is true, right?). But if a user builds their own project with false we will still link it with a runtime built with -fexceptions. Is that reasonable? I wouldn't think so... the runtime's code wouldn't be using the wasm EH instructions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the problem is that (as far as i can tell) even if you don't use EH, building with -fexceptions causes the emscripten JS exception handling glue to get linked in, and if you build with -fwasm-exceptions it links in C++ exception handling glue instead. So if we build with a mix of both, we have trouble, aside from the question of "what happens if you put a try/catch in your code".

It's possible we can get away with these flags being wrong as long as the flags are correct at the point of the final link. I'm not really sure, I don't fully understand all these different parts of our build.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this, I'm still not confident in it but it seems to work.

src/mono/wasm/wasm.proj Outdated Show resolved Hide resolved
@kg
Copy link
Contributor Author

kg commented Sep 12, 2023

This does not appear to fix #90692, but from testing in the debugger that's because Environment.Exit is not throwing a JS exception, so even if I use catch with a tag we still catch whatever is being thrown.

The problem appears to be somewhere in the AOT generated code, we've got catch_all opcodes in dotnet.native.wasm that are trapping the JS exception and turning it into a C++ exception, which the jiterpreter then catches and dutifully sets the thrown flag for.

image

@kg kg requested a review from lambdageek September 12, 2023 03:30
@vargaz
Copy link
Contributor

vargaz commented Sep 12, 2023

Wouldn't be simpler to use the existing mono_llvm_cpp_catch_exception with a callback ?
Also, for the concrete problem, wouldn't be possible to change Exit () to not throw an exception ?

@kg
Copy link
Contributor Author

kg commented Sep 12, 2023

Wouldn't be simpler to use the existing mono_llvm_cpp_catch_exception with a callback ?

That doesn't appear to fix this. The purpose of the jiterp stuff is to eliminate the indirect call per transition, which gives a measurable speedup. With this PR's changes the three modes (mono_llvm_cpp_catch_exception, jiterpreter's generic dispatcher, and jiterpreter's specialized dispatchers) should generate equivalent code from what I know.

Also, for the concrete problem, wouldn't be possible to change Exit () to not throw an exception ?

I'm not sure what we would replace it with, but yeah. Another option would be to change the interp.c logic so that if it sees no exception data is available, it knows a Special Exception was thrown that is supposed to fully unwind the stack (which would be Environment.Exit or Thread.Abort, I think?) and proceeds to unwind. But the original Special Exception is still gone without a trace because some part of our AOT machinery ate it.

We would still have a problem though because libc code expects to be able to abort and we would trap those exceptions too I think.

@kg
Copy link
Contributor Author

kg commented Sep 14, 2023

Updated to remove the jiterpreter's generic jitcall dispatcher - it's no longer necessary since Wasm EH is the default for the runtime now. It was only useful for JS-based EH. This removes some potential failure cases, should reduce the size of the runtime JS a little bit, and reduces code complexity.

@lewing
Copy link
Member

lewing commented Sep 15, 2023

Does adding a test case make sense?

@kg
Copy link
Contributor Author

kg commented Sep 15, 2023

I'm not sure how we would write a test for this, since Environment.Exit would kill the test host. Maybe something that does a bunch of hops through JS and C# would work...

src/mono/wasm/runtime/jiterpreter-feature-detect.ts Outdated Show resolved Hide resolved
@@ -291,6 +291,8 @@ export interface t_Cwraps {
// returns new size of queue after add
mono_jiterp_tlqueue_add(queue: number, value: VoidPtr): number;
mono_jiterp_tlqueue_clear(queue: number): void;
mono_jiterp_begin_catch(ptr: number): void;
mono_jiterp_end_catch(): void;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this always throws, right ? In that case the return type should be never not void

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it doesn't always throw

src/mono/wasm/runtime/jiterpreter-jit-call.ts Outdated Show resolved Hide resolved
builder.appendU8(WasmOpcode.catch_all);
builder.appendU8(WasmOpcode.catch_);
builder.appendULeb(builder.getTypeIndex("__cpp_exception"));
builder.callImport("begin_catch");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the JS code in mono_interp_invoke_wasm_jit_call_trampoline has more branches than this. Could you please explain what is the difference ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inside this branch we know wasm EH is enabled, so we by definition will not encounter a JS exception... I think. Even if we did, we couldn't handle it natively using wasm code - we would have to use JS.

invoke_wasm_jit_call_trampoline on the other hand needs to work whether wasm EH is enabled or disabled. It is responsible for handling JS exceptions on behalf of trampolines (which can't handle them).

const result: any = {
c: <any>this.getConstants(),
m: { h: memory },
};
if (exceptionTag)
result.x = { e: exceptionTag };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure x is great name if the exception could ever hit or come from customer code. It may conflict there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these 'x' and 'e' names are wasm import names used by jiterpreter modules

this.appendName("x");
this.appendName("e");
// tagtype
this.appendU8(0x04);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please make those named constants

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are binary constants from the webassembly spec's binary format, the comment above them explains what they are.

@pavelsavara
Copy link
Member

I'm not sure how we would write a test for this, since Environment.Exit would kill the test host. Maybe something that does a bunch of hops through JS and C# would work...

I don't understand how this is related to Environment.Exit. Is that the only way how the code in this PR is triggered ?

Would dereferencing null in C# also cause trigger this code ?

kg added 4 commits October 2, 2023 17:35
… tag from our wasm module

Expose the cxa catch functions to jiterp

Import the exception tag type from dotnet.wasm when generating jiterpreter modules

Checkpoint

Maybe fix build on CI

Build flag cleanup

Clean up build flags

Properly free C++ exceptions in do_jit_call_indirect_js and don't trap non-C++ exceptions

Properly clean up exceptions in the fallback trampoline invoke routine (though this shouldn't be necessary)

Call begin_catch/end_catch in generated jitcall trampolines

Checkpoint: Support for emscripten's JS-based exception handling

Disable log messages
@@ -207,7 +207,7 @@
<ActualWasiSdkVersion>%(_ActualVersionLines.Identity)</ActualWasiSdkVersion>
<ExpectedWasiSdkVersion>%(_ExpectedVersionLines.Identity)</ExpectedWasiSdkVersion>
</PropertyGroup>
<Error Text="Expected and actual version of WASI SDK does not match. Please delete $(WASI_SDK_PATH) folder to provision a new version."
<Error Text="Expected and actual version of WASI SDK does not match. Please delete $(WASI_SDK_PATH) folder to provision a new version."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@radical
Copy link
Member

radical commented Oct 3, 2023

/azp run runtime-wasm

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@kg
Copy link
Contributor Author

kg commented Oct 4, 2023

Tests were green in the past, but these two failures look like they could be related. I'm not sure how this would cause a timeout though.

I'm not sure why there is a WBT failure in the test analysis but it's not showing up in the checks list...

@radical
Copy link
Member

radical commented Oct 4, 2023

I can see only one failure, the debugger tests. Just fyi, you might want to run that again because I have seen this failure on debugger tests on main too.

Copy link
Member

@pavelsavara pavelsavara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kg
Copy link
Contributor Author

kg commented Oct 5, 2023

Threading failure is real, looking into it

smoke: running LockTest
dotnet.native.js:987 LockTest A ManagedThreadId: 1
dotnet.native.js:987 LockTest B ManagedThreadId: 1
dotnet.native.js:987 LockTest C ManagedThreadId: 1
dotnet.native.js:987 LockTest D ManagedThreadId: 2
dotnet.native.js:987 LockTest E ManagedThreadId: 2
dotnet.native.js:987 LockTest F ManagedThreadId: 2
main.js:29 smoke: LockTest done 
main.js:32 smoke: running DisposeTest
dotnet.native.js:987 DisposeTest A ManagedThreadId: 1
dotnet.native.js:987 DisposeTest 0 ManagedThreadId: 1
dotnet.native.js:987 DisposeTest 1 ManagedThreadId: 2
dotnet.native.js:987 DisposeTest 2 ManagedThreadId: 2
dotnet.native.js:987 DisposeTest 3 ManagedThreadId: 2
dotnet.native.js:987 DisposeTest 4 ManagedThreadId: 2
dotnet.native.js:987 DisposeTest 5 ManagedThreadId: 2
main.js:34 smoke: DisposeTest done 
main.js:36 smoke: running TestHelloWebWorker
dotnet.native.js:987 smoke: TestHelloWebWorker 1 ManagedThreadId:1, SynchronizationContext: System.Runtime.InteropServices.JavaScript.JSSynchronizationContext
dotnet.native.js:987 smoke: TestHelloWebWorker 5 ManagedThreadId:1, SynchronizationContext: System.Runtime.InteropServices.JavaScript.JSSynchronizationContext
dotnet.native.js:987 smoke: TestHelloWebWorker 2 ManagedThreadId:4, SynchronizationContext: null
dotnet.native.js:987 smoke: TestHelloWebWorker 6 ManagedThreadId:2, SynchronizationContext: null
dotnet.native.js:987 smoke: TestHelloWebWorker 1 ManagedThreadId:1, SynchronizationContext: System.Runtime.InteropServices.JavaScript.JSSynchronizationContext
dotnet.native.js:987 smoke: TestHelloWebWorker 5 ManagedThreadId:1, SynchronizationContext: System.Runtime.InteropServices.JavaScript.JSSynchronizationContext
dotnet.native.js:987 smoke: TestHelloWebWorker 2 ManagedThreadId:6, SynchronizationContext: null
dotnet.native.js:987 smoke: TestHelloWebWorker 6 ManagedThreadId:5, SynchronizationContext: null
dotnet.native.js:987 smoke: TestHelloWebWorker 1 ManagedThreadId:1, SynchronizationContext: System.Runtime.InteropServices.JavaScript.JSSynchronizationContext
dotnet.native.js:987 smoke: TestHelloWebWorker 5 ManagedThreadId:1, SynchronizationContext: System.Runtime.InteropServices.JavaScript.JSSynchronizationContext
dotnet.native.js:987 smoke: TestHelloWebWorker 2 ManagedThreadId:7, SynchronizationContext: null
dotnet.native.js:987 smoke: TestHelloWebWorker 6 ManagedThreadId:5, SynchronizationContext: null
dotnet.native.js:987 smoke: TestHelloWebWorker 1 ManagedThreadId:1, SynchronizationContext: System.Runtime.InteropServices.JavaScript.JSSynchronizationContext
logging.ts:32 MONO_WASM:    at System.Reflection.MethodBaseInvoker.InvokeWithFewArgs(Object , BindingFlags , Binder , Object[] , CultureInfo )
   at System.Reflection.RuntimeMethodInfo.Invoke(Object , BindingFlags , Binder , Object[] , CultureInfo )
   at System.Reflection.MethodBase.Invoke(Object , Object[] )
   at System.Runtime.InteropServices.JavaScript.WebWorker.RunAsync(Action , CancellationToken )
   at System.Runtime.InteropServices.JavaScript.WebWorker.RunAsync(Action )
   at Sample.Test.TestHelloWebWorker()
   at Sample.Test.__Wrapper_TestHelloWebWorker_1439294355(JSMarshalerArgument* __arguments_buffer)
Error: Arg_TargetInvocationException
    at ar (https://localhost:45941/_framework/dotnet.runtime.js:3:33479)
    at xo (https://localhost:45941/_framework/dotnet.runtime.js:3:66167)
    at Object.<anonymous> (https://localhost:45941/_framework/dotnet.runtime.js:3:207851)
    at https://localhost:45941/main.js:40:31
u @ logging.ts:32
logging.ts:20 WASM EXIT 2
localhost/:1 Uncaught    at System.Reflection.MethodBaseInvoker.InvokeWithFewArgs(Object , BindingFlags , Binder , Object[] , CultureInfo )
   at System.Reflection.RuntimeMethodInfo.Invoke(Object , BindingFlags , Binder , Object[] , CultureInfo )
   at System.Reflection.MethodBase.Invoke(Object , Object[] )
   at System.Runtime.InteropServices.JavaScript.WebWorker.RunAsync(Action , CancellationToken )
   at System.Runtime.InteropServices.JavaScript.WebWorker.RunAsync(Action )
   at Sample.Test.TestHelloWebWorker()
   at Sample.Test.__Wrapper_TestHelloWebWorker_1439294355(JSMarshalerArgument* __arguments_buffer)
Error: Arg_TargetInvocationException
    at ar (https://localhost:45941/_framework/dotnet.runtime.js:3:33479)
    at xo (https://localhost:45941/_framework/dotnet.runtime.js:3:66167)
    at Object.<anonymous> (https://localhost:45941/_framework/dotnet.runtime.js:3:207851)
    at https://localhost:45941/main.js:40:31

@kg
Copy link
Contributor Author

kg commented Oct 5, 2023

Looks like it's not my fault and the sample was already broken:

Uncaught    at System.Reflection.MethodBaseInvoker.InvokeWithFewArgs(Object , BindingFlags , Binder , Object[] , CultureInfo )
   at System.Reflection.RuntimeMethodInfo.Invoke(Object , BindingFlags , Binder , Object[] , CultureInfo )
   at System.Reflection.MethodBase.Invoke(Object , Object[] )
   at System.Runtime.InteropServices.JavaScript.WebWorker.RunAsync(Action , CancellationToken )
   at System.Runtime.InteropServices.JavaScript.WebWorker.RunAsync(Action )
   at Sample.Test.TestHelloWebWorker()
   at Sample.Test.__Wrapper_TestHelloWebWorker_1643690243(JSMarshalerArgument* __arguments_buffer)
Error: TargetInvocationException: inner=System.Threading.ThreadStartException: Arg_ThreadStartException
 ---> System.ExecutionEngineException: mono_thread_platform_create_thread() failed
   Exception_EndOfInnerExceptionStack
   at System.Threading.Thread.ThrowThreadStartException(Exception ex)
   at System.Threading.Thread.StartCore()
   at System.Threading.Thread.Start(Boolean , Boolean )
   at System.Threading.Thread.Start()
   at System.Runtime.InteropServices.JavaScript.WebWorker.Run(Action body, CancellationToken cancellationToken)
   at System.Object.InvokeStub_WebWorker.Run(Object , Span`1 )
   at System.Reflection.MethodBaseInvoker.InvokeWithFewArgs(Object , BindingFlags , Binder , Object[] , CultureInfo )
    at ar (https://localhost:33239/_framework/dotnet.runtime.js:3:33479)
    at xo (https://localhost:33239/_framework/dotnet.runtime.js:3:66167)
    at Object.<anonymous> (https://localhost:33239/_framework/dotnet.runtime.js:3:207851)
    at https://localhost:33239/main.js:45:31
logging.ts:131 

Copy link
Member

@lambdageek lambdageek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmake build bits LGTM

Copy link
Member

@radical radical left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

msbuild bits look good

@kg kg merged commit 4bdd490 into dotnet:main Oct 5, 2023
117 checks passed
@ghost ghost locked as resolved and limited conversation to collaborators Nov 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Mono WASM + AOT] Environment.Exit does not work together with exception handling
6 participants