Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wasm marshaler API for user-defined types; migrate Uri and DateTime to new API #47640

Closed
wants to merge 3 commits into from

Conversation

kg
Copy link
Member

@kg kg commented Jan 29, 2021

This PR introduces a system for defining custom JS➔Managed and Managed➔JS marshaling primitives so that any* user-defined type can cross between environments seamlessly, and migrates our existing Uri and DateTime marshaling logic to use the new system.

First, some sample code:

    public static class DateTimeMarshaler
    {
        public static string JavaScriptToInterchangeTransform => "return value.toISOString()";
        public static string InterchangeToJavaScriptTransform => "return new Date(value)";

        public static DateTime FromJavaScript (string s)
        {
            return DateTime.Parse(s).ToUniversalTime();
        }

        public static string ToJavaScript (in DateTime dt)
        {
            return dt.ToString("o");
        }
    }

General overview:

  • The author of the type exposes static methods with specific names and signatures on the type so that the wasm bindings layer can find them.
    • Right now there is a manual linker configuration to keep our built-in marshalers from being linked out. The user will need to do this themselves for any marshalers they define.
    • The runtime receives a table of type➔marshaler mappings at startup, and this table is generated by WasmAppBuilder.
    • The ideal form of this in the future will be a source generator that automatically finds marshalers and generates the linker configuration + table, but right now this is not possible (source generators can't do the things necessary for this yet).
  • The required methods are a JS➔Managed mapping and a Managed➔JS mapping. The types used by these methods have to (at present) be basic types that can cross through the bindings layer without assistance, i.e. double or string. In the future I hope to expand on this to allow marshaling spans or arrays.
  • In addition to the direct JS⬌Managed mapping, you can optionally define additional methods that return JS expression filters. Those expression filters will be evaluated on the JS side of the boundary to process the value - for example since neither JS Date or C# DateTime can cross the bindings boundary directly, you can marshal it as a string and then use a filter to map it to/from the JS Date type.
    • TODO: It would be ideal to expose these as const string instead of function getters, but right now the only straightforward way to get at managed values is through function calls.
  • Alongside all this we add a new 'a' type specifier for method signatures (the strings you pass to call_static_method, bind_static_method etc) that basically means 'figure it out'. When you use this specifier, the bindings layer will examine the target method and identify the best fit for that parameter. This is meant for the purpose of passing user-defined types so you don't need to think about whether they're classes or structs, but it also can be useful in other cases.
    • The downside is that any signature using this type specifier ends up being method-specific, so you will end up with method-specific generated code instead of the existing generated code that is shared by all methods with a given signature. There's room for some improvement here.

More detailed notes:

  • Before this PR we had zero support for passing structs across the JS⬌Managed boundary in any circumstances (other than DateTime).
  • The JS➔Managed conversion has to box the resulting value (if it's a struct), which introduces some unpleasant overhead. This ends up being less inefficient than you'd think (because currently we have to box all values being passed to managed methods from JS - not really sure why) but it's still an opportunity for performance improvement between eliminating some copies and ensuring the GC isn't involved.
  • The introduction of support for marshaling custom classes introduces a bit of overhead for classes without custom marshaling implementations, because of the additional check added to the existing Managed➔JS flow. However, the new check is extremely cheap and the existing flow was very slow 😊
  • There are corner cases where this new system will not run (throwing a runtime exception) or will otherwise fail (for example if the signature of your methods is weird you can get garbage values). I think many of these can be tightened up if we write more test cases and add more error handling and checks to match.
  • Many scenarios might call for an automated way to pass JSON blobs or raw byte data across the boundary. You can at least use filters (i.e. JSON.parse(value)) to simplify the former, but the latter really demands integrated support for passing Spans around and I have no idea how we'd do this in the existing setup.
  • We could consider having some sort of default fallback marshaling implementation based on BinaryFormatter or something like that, but it seems out of scope. This system is at least relatively easy to extend to do that as a fallback.

Also in this PR:

  • box_js_obj_with_converter API that allows you to request a specific managed type when boxing a JS value (this is how you create a type like Uri explicitly)
  • Caching for the existing 'find class in assembly' APIs, so that they can be used efficiently at runtime when trying to box values of a specific type
  • Caching for bind_method so that automated tests and benchmarks can safely bind methods at the start of each run without running out of runtime memory or incurring significant performance overhead (this was causing benchmarks to fail, because i had optimized the bindings so much that the run count in BDN was skyrocketing)
  • Additional test coverage for some of the existing code, like being able to pass JS Dates. I think that was actually broken, but Blazor works so whatever...
  • Miscellaneous performance improvements, producing around a ~4-6% performance gain on my benchmarks for existing call paths
  • Miscellaneous bug fixes and additional assertions in runtime/driver code (for things that previously would have produced crashes)
  • Better debug information for generated code so it's easier to examine in browser debuggers and profilers

@kg kg added NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) arch-wasm WebAssembly architecture labels Jan 29, 2021
@ghost
Copy link

ghost commented Jan 29, 2021

Tagging subscribers to 'arch-wasm': @lewing
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR introduces a system for defining custom JS➔Managed and Managed➔JS marshaling primitives so that any* user-defined type can cross between environments seamlessly. The ideal end goal is to use this new system for marshaling types like Uri, DateTime and Task so that they can be linked out if they are not used.

First, some sample code:

        public struct CustomDate {
            public DateTime Date;

            private static string JSToManaged_PreFilter () => "value.toISOString()";
            private static string ManagedToJS_PostFilter () => "new Date(value)";

            private static CustomDate JSToManaged (string s) {
                return new CustomDate { 
                    Date = DateTime.Parse(s).ToUniversalTime()
                };
            }

            private static string ManagedToJS (ref CustomDate cd) {
                return cd.Date.ToString("o");
            }
        }

General overview:

  • The author of the type exposes static methods with specific names and signatures on the type so that the wasm bindings layer can find them.
    • TODO: We need some straightforward way to ensure that the linker doesn't remove these methods from wasm builds unless the type itself is not referenced
  • The required methods are a JS➔Managed mapping and a Managed➔JS mapping. The types used by these methods have to (at present) be basic types that can cross through the bindings layer without assistance, i.e. double or string. In the future I hope to expand on this to allow marshaling spans or arrays.
  • In addition to the direct JS⬌Managed mapping, you can optionally define additional methods that return JS expression filters. Those expression filters will be evaluated on the JS side of the boundary to process the value - for example since neither JS Date or C# DateTime can cross the bindings boundary directly, you can marshal it as a string and then use a filter to map it to/from the JS Date type.
    • TODO: It would be ideal to expose these as const string instead of function getters, but right now the only straightforward way to get at managed values is through function calls.
  • Alongside all this we add a new 'a' type specifier for method signatures (the strings you pass to call_static_method, bind_static_method etc) that basically means 'figure it out'. When you use this specifier, the bindings layer will examine the target method and identify the best fit for that parameter. This is meant for the purpose of passing user-defined types so you don't need to think about whether they're classes or structs, but it also can be useful in other cases.
    • The downside is that any signature using this type specifier ends up being method-specific, so you will end up with method-specific generated code instead of the existing generated code that is shared by all methods with a given signature. There's room for some improvement here.

More detailed notes:

  • Before this PR we had zero support for passing structs across the JS⬌Managed boundary in any circumstances (other than DateTime).
  • The JS➔Managed conversion has to box the resulting value (if it's a struct), which introduces some unpleasant overhead. This ends up being less inefficient than you'd think (because currently we have to box all values being passed to managed methods from JS - not really sure why) but it's still an opportunity for performance improvement between eliminating some copies and ensuring the GC isn't involved.
  • The introduction of support for marshaling custom classes introduces a bit of overhead for classes without custom marshaling implementations, because of the additional check added to the existing Managed➔JS flow. However, the new check is extremely cheap and the existing flow was very slow 😊
  • There are corner cases where this new system will not run (throwing a runtime exception) or will otherwise fail (for example if the signature of your methods is weird you can get garbage values). I think many of these can be tightened up if we write more test cases and add more error handling and checks to match.
  • Many scenarios might call for an automated way to pass JSON blobs or raw byte data across the boundary. You can at least use filters (i.e. JSON.parse(value)) to simplify the former, but the latter really demands integrated support for passing Spans around and I have no idea how we'd do this in the existing setup.
  • We could consider having some sort of default fallback marshaling implementation based on BinaryFormatter or something like that, but it seems out of scope. This system is at least relatively easy to extend to do that as a fallback.

Also in this PR:

  • Additional test coverage for some of the existing code, like being able to pass JS Dates. I think that was actually broken, but Blazor works so whatever...
Author: kg
Assignees: -
Labels:

* NO MERGE *, arch-wasm

Milestone: -

@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

@kg
Copy link
Member Author

kg commented Jan 29, 2021

To help with understanding some of what this machinery does, here's an example of some of the code generated by the bindings to facilitate calling C# from JS:
https://gist.github.com/kg/f2c350b007d8e783085124c5d03f508e

@lewing
Copy link
Member

lewing commented Jan 30, 2021

Failure is relevant

@radical
Copy link
Member

radical commented Feb 2, 2021

This breaks debugger tests with * Assertion: should not be reached at /Users/radical/dev/runtime/src/mono/mono/metadata/class-accessors.c:86.
To reproduce: $ make -C src/mono/wasm run-debugger-tests TEST_FILTER=DebuggerTests.ArrayTests.InvalidArrayId

It fails at this.async_method = Module.mono_bind_static_method ("[debugger-test] Math/NestedInMath:AsyncTest");
(https://github.com/dotnet/runtime/blob/master/src/mono/wasm/debugger/tests/debugger-test/debugger-driver.html#L14)

The method in question: public static async System.Threading.Tasks.Task<bool> AsyncTest(string s, int i)
(https://github.com/dotnet/runtime/blob/master/src/mono/wasm/debugger/tests/debugger-test/debugger-test.cs#L139)

.. is in a nested class Math.NestedInMath.

@kg
Copy link
Member Author

kg commented Feb 2, 2021

Thanks for digging in. I bet it's due to generics.

@kjpou1
Copy link
Contributor

kjpou1 commented Feb 5, 2021

Not sure if it is generics or not but I had to add some code for handling generic signatures in the PR here: #47519

This allows for generic signatures.

@kg kg force-pushed the bindings-new-marshaler-api branch from 2a4b472 to c8554ec Compare February 5, 2021 12:19
@kjpou1
Copy link
Contributor

kjpou1 commented Feb 10, 2021

Just running through the DRAFT PR:

info: class_is_task
  info: class_is_task
  info: creating signature info for method result
  info: creating signature info for method params
  fail: [out of order message from the browser]: http://127.0.0.1:61482/dotnet.wasm 0:1703801 Uncaught RuntimeError: divide by zero
  info: GC_MINOR: (Nursery full) time 2.40ms, stw 2.44ms promoted 259K major size: 1280K in use: 687K los size: 0K in use: 0K
  info: GC_MINOR: (Nursery full) time 2.41ms, stw 2.43ms promoted 0K major size: 1280K in use: 687K los size: 0K in use: 0K
  info: GC_MINOR: (Nursery full) time 3.01ms, stw 3.03ms promoted 0K major size: 1280K in use: 687K los size: 0K in use: 0K
  info: GC_MINOR: (Nursery full) time 2.30ms, stw 2.32ms promoted 0K major size: 1280K in use: 687K los size: 0K in use: 0K

@kg kg force-pushed the bindings-new-marshaler-api branch 2 times, most recently from d4fef0c to 7d09a5d Compare February 25, 2021 03:15
Base automatically changed from master to main March 1, 2021 09:07
@kg kg force-pushed the bindings-new-marshaler-api branch 2 times, most recently from a439856 to 52d3615 Compare March 19, 2021 04:10
@kg kg force-pushed the bindings-new-marshaler-api branch from 8897689 to 990a19f Compare March 23, 2021 20:16
@ghost ghost closed this Apr 26, 2021
@ghost
Copy link

ghost commented Apr 26, 2021

Draft Pull Request was automatically closed for inactivity. It can be manually reopened in the next 30 days if the work resumes.

@kg kg reopened this May 4, 2021
@kg kg force-pushed the bindings-new-marshaler-api branch from 4f05f98 to fd96dc3 Compare May 7, 2021 02:19
@kg kg force-pushed the bindings-new-marshaler-api branch from 5eca689 to 86b404b Compare May 20, 2021 22:29
@lewing
Copy link
Member

lewing commented May 21, 2021

@kg are you still missing some changes?

@kg
Copy link
Member Author

kg commented May 22, 2021

@kg are you still missing some changes?

I'm still working through the msbuild integration in a way that won't break tests

@kg
Copy link
Member Author

kg commented Nov 10, 2021

Here are a set of generated marshaler functions, @pavelsavara

//# sourceURL=https://mono-wasm.invalid/System_DateTime$FromJavaScript
"use strict";
const Module = __closure__.Module;
const mono_wasm_new_root = __closure__.mono_wasm_new_root;
const _create_temp_frame = __closure__._create_temp_frame;
const _get_args_root_buffer_for_method_call = __closure__._get_args_root_buffer_for_method_call;
const _get_buffer_for_method_call = __closure__._get_buffer_for_method_call;
const _handle_exception_for_call = __closure__._handle_exception_for_call;
const _teardown_after_call = __closure__._teardown_after_call;
const mono_wasm_try_unbox_primitive_and_get_type = __closure__.mono_wasm_try_unbox_primitive_and_get_type;
const _unbox_mono_obj_root_with_known_nonprimitive_type = __closure__._unbox_mono_obj_root_with_known_nonprimitive_type;
const invoke_method = __closure__.invoke_method;
const method = __closure__.method;
const this_arg = __closure__.this_arg;
const token = __closure__.token;
const unbox_buffer = __closure__.unbox_buffer;
const unbox_buffer_size = __closure__.unbox_buffer_size;
const getI32 = __closure__.getI32;
const getU32 = __closure__.getU32;
const getF32 = __closure__.getF32;
const getF64 = __closure__.getF64;
const converter_d_result_unmarshaled = __closure__.converter_d_result_unmarshaled;

function System_DateTime$FromJavaScript(arg0) {
    _create_temp_frame();
    let resultRoot = token.scratchResultRoot, exceptionRoot = token.scratchExceptionRoot;
    token.scratchResultRoot = null;
    token.scratchExceptionRoot = null;
    if (resultRoot === null)
        resultRoot = mono_wasm_new_root();
    if (exceptionRoot === null)
        exceptionRoot = mono_wasm_new_root();

    let argsRootBuffer = _get_args_root_buffer_for_method_call(converter_d_result_unmarshaled, token);
    let scratchBuffer = _get_buffer_for_method_call(converter_d_result_unmarshaled, token);
    let buffer = converter_d_result_unmarshaled.compiled_function(
        scratchBuffer, argsRootBuffer, method,
        arg0
    );
    let is_result_marshaled = false;

    resultRoot.value = invoke_method(method, this_arg, buffer, exceptionRoot.get_address());
    _handle_exception_for_call(converter_d_result_unmarshaled, token, buffer, resultRoot, exceptionRoot, argsRootBuffer);

    let resultPtr = resultRoot.value, result = undefined;
        result = resultPtr;
    _teardown_after_call(converter_d_result_unmarshaled, token, buffer, resultRoot, exceptionRoot, argsRootBuffer);
    return result;
};
return System_DateTime$FromJavaScript;


//# sourceURL=https://mono-wasm.invalid/interchange_filter_for_type62173044
"use strict";
const Module = __closure__.Module;
const MONO = __closure__.MONO;
const BINDING = __closure__.BINDING;
const typePtr = __closure__.typePtr;
const alloca = __closure__.alloca;
const getI8 = __closure__.getI8;
const getI16 = __closure__.getI16;
const getI32 = __closure__.getI32;
const getI64 = __closure__.getI64;
const getU8 = __closure__.getU8;
const getU16 = __closure__.getU16;
const getU32 = __closure__.getU32;
const getF32 = __closure__.getF32;
const getF64 = __closure__.getF64;
const setI8 = __closure__.setI8;
const setI16 = __closure__.setI16;
const setI32 = __closure__.setI32;
const setI64 = __closure__.setI64;
const setU8 = __closure__.setU8;
const setU16 = __closure__.setU16;
const setU32 = __closure__.setU32;
const setF32 = __closure__.setF32;
const setF64 = __closure__.setF64;
const System_DateTime$FromJavaScript = __closure__.System_DateTime$FromJavaScript;

function interchange_filter_for_type62173044(value) {

    switch (typeof (value)) {
        case 'number':
            return value;
        default:
            if (value instanceof Date) {
                return value.valueOf();
            } else
                throw new Error('Value must be a number (msecs since unix epoch), or a Date');
    }

};
return interchange_filter_for_type62173044;


//# sourceURL=https://mono-wasm.invalid/js_to_interchange_for_type62173044
"use strict";
const Module = __closure__.Module;
const MONO = __closure__.MONO;
const BINDING = __closure__.BINDING;
const typePtr = __closure__.typePtr;
const alloca = __closure__.alloca;
const getI8 = __closure__.getI8;
const getI16 = __closure__.getI16;
const getI32 = __closure__.getI32;
const getI64 = __closure__.getI64;
const getU8 = __closure__.getU8;
const getU16 = __closure__.getU16;
const getU32 = __closure__.getU32;
const getF32 = __closure__.getF32;
const getF64 = __closure__.getF64;
const setI8 = __closure__.setI8;
const setI16 = __closure__.setI16;
const setI32 = __closure__.setI32;
const setI64 = __closure__.setI64;
const setU8 = __closure__.setU8;
const setU16 = __closure__.setU16;
const setU32 = __closure__.setU32;
const setF32 = __closure__.setF32;
const setF64 = __closure__.setF64;
const System_DateTime$FromJavaScript = __closure__.System_DateTime$FromJavaScript;
const filter = __closure__.filter;

function js_to_interchange_for_type62173044(value, method, parmIdx) {
    let filteredValue = filter(value);
    let convertedResult = System_DateTime$FromJavaScript(filteredValue, method, parmIdx);
    return convertedResult;
};

return js_to_interchange_for_type62173044;

@kg
Copy link
Member Author

kg commented Nov 11, 2021 via email

@kg
Copy link
Member Author

kg commented Nov 20, 2021

PR updated with support for passing Span<byte> and ReadOnlySpan<byte> arguments to methods using the new 'b' signature character, and marshalers now explicitly request a scratch buffer instead of manually allocating it using alloca, which allows them to take Span arguments instead. Note that returning spans is not valid, since return values have to be boxed. I don't see this as a problem.

@azure-pipelines
Copy link

Pull request contains merge conflicts.

@kg kg force-pushed the bindings-new-marshaler-api branch 2 times, most recently from ceba251 to def3119 Compare December 8, 2021 23:51
@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@kg
Copy link
Member Author

kg commented Dec 9, 2021

/azp run runtime-manual

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@kg kg force-pushed the bindings-new-marshaler-api branch 3 times, most recently from 221f4c4 to b23df27 Compare December 15, 2021 00:27
@kg
Copy link
Member Author

kg commented Dec 15, 2021

/azp run runtime-manual

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@kg
Copy link
Member Author

kg commented Dec 15, 2021

/azp run runtime-manual

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@kg kg force-pushed the bindings-new-marshaler-api branch 2 times, most recently from 8afdc4a to 0deb1c2 Compare December 21, 2021 00:12
kg added 3 commits January 5, 2022 21:02
Add test coverage for date marshaling
Add uri test
Disable some log statements
Returning structs works
Returning custom classes works
Add null checks to describe_value
Remove invalid cwrap
Add trimming annotations
Restore old benchmark html
Reuse result and exception roots
Cleaner codegen
Disambiguate converters in devtools
Transition back to static methods, add signature checks and more error handling
Implement a simple bound method cache so that automated tests don't exhaust the scratch root buffer; make the scratch root buffer smaller
Fix a LMF leak
Fix C warnings
Hard-coded table of marshalers is no longer needed
Shorter wrapper function names for better stacks
Add linker exclusions for custom marshalers
Hack around bug in linker/runtime that causes System.Uri forwarder to break
Rework class lookup
Support passing numeric values as DateTime
Transition some null returns to asserts
Clean up conditionals
Don't use ISO strings
Change pre/post filter syntax to require an explicit "return" so it can contain multiple statements
Allow ToJavaScript to accept a pointer instead of an 'in' reference
Support managed pointer return/parameter types in more places
Add marshal_type for pointers
Add tests verifying that you can marshal structs by address and then unpack them
Initial gc safety work
Fix formatting rule
Rename pre/post filters
Fix type error in driver.c
remove _pick_result_chara_for_marshal_type
Add library build descriptor to ensure marshalers are not stripped when the BCL is trimmed
Attempt to fix the linker stripping test code
Better version of no-configured-marshaler warning
Annotate SetupJSContinuation
Annotate safehandle APIs and add null checks
Update targets file to pass custom marshaler msbuild items through to the helix proxy project (requires another change to work)
Add tests for Task and ValueTask marshaling
Fix unboxing for generic structs
Rebase cleanups
Correct datetime test to use UTC for comparison
Eliminate use of MONO. and BINDING. in closures
Optimize out a js->c call
Normalize some APIs to take MonoType instead of MonoClass
Repair merge damage
Address PR feedback
Move some types around
Repair merge damage
Type system fixes
Rework create_named_function so that it can handle larger numbers of closure keys more efficiently
Remove unnecessary test instrumentation
Fix closure variables being generated in the wrong place
Use a single memory slab for temp_malloc
Checkpoint span support
Fix unbuffered filters, support ReadOnlySpan
Fix auto signatures for primitives and add test
Use 4-chara unicode escapes since the x escapes are not officially permitted in JSON
Checkpoint C# implementation of converter generator
Align everything by 8 when constructing argument buffers because if you don't do that, the runtime passes corrupt data to C# functions
Don't shove raw mono pointers into root buffers since we weren't doing it before (it might be nice to do it though)
Fully transition over to having C# generate signature converters
C# implementation of bind_method codegen
Clean up the bindings named closure table management
Add some comments
Don't allocate a root for the this-ref when binding methods if the this-ref is null
Don't generate dead code for void signatures
Remove some dead code and refactor bound method generator
Generate more specialized result handling code for bound methods. Align C# and C's idea of marshal types better.
Fix GC safety issue
Fix tsc wasting 70 seconds on every build
Add basic build profiling information
Fix root index being incorrect
Rename MemOffset and MemValue
Remove bind_method this_arg support
Move some stuff around
Change nested type descriptor syntax
…m hiding other errors, and avoid double teardown when managed code throws
@kg kg force-pushed the bindings-new-marshaler-api branch from fd1dc81 to 85eb349 Compare January 6, 2022 07:10
@kg kg closed this May 25, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Jun 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-wasm WebAssembly architecture area-Interop-mono NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants