Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Host <-> Sandbox interop #315

Closed
ElusiveMori opened this issue Mar 30, 2019 · 11 comments
Closed

Host <-> Sandbox interop #315

ElusiveMori opened this issue Mar 30, 2019 · 11 comments
Labels
❓ question I've a question!

Comments

@ElusiveMori
Copy link

Summary

I've been thinking lately about how to use Wasmer as a general-purpose embedder for sandboxed scripting (I'm sure I'm not the only one).

However, there is that venerable problem that, at this point in time, it's not exactly easy to pass around data across the Wasm boundary.

There is, of course, wasm-bindgen, however, as far as I can tell, it is specifically tailored for the JS <-> RustWasm use-case. It generates the necessary Rust boilerplate on the Wasm side, and complements it with some JS boilerplate on the JS side. Its under-the-hood operations are also non-trivial (you can see just by how much if you try to cargo-expand one of their "small" examples).

Unless I am missing something, this means that we can't really rely on wasm-bindgen for the use-case where the host application is not a browser (such as with Wasmer), and we need some other solution, and as far as I know, there isn't any at the moment.

So, I'm curious if there's any discussion and/or plans regarding this, both in Wasmer and in the broader Wasm ecosystem, especially now with the growing interest of using Wasm as a general-purpose target rather than Web-only.

Extra Thoughts

There's also the host bindings and reference types proposals, which might make this a little bit easier, but they are still going to take some time to go through. Not to mention that I'm really just a pleb and don't really have any idea what I'm talking about. What can we do now to make our lives easier?

As a simple case study, for now it might be useful to examine just Rust (Native) <-> Rust (Wasm) interop in the scope of Wasmer. For example:

Passing a struct (by value) Native -> Wasm:

  1. Native requests Wasm to allocate the necessary amount of memory for the struct. Let's use #[repr(C)] for a stable representation.
  2. Wasm returns a pointer to the allocated memory within it's linear memory to the Native side.
  3. Native side gets the instance's memory, grabs the returned pointer, and writes the struct to its location.
  4. Native then calls the actual function that would use this struct in Wasm.

However, this use-case is right now impossible in Wasmer, because Wasmer only exposes immutable access to an instance's linear memory. There's no (safe?) way to do this.

The wasmer_runtime::Ctx struct has the data and data_finalizer fields, though no documentation explaining their purpose (I suspect it's a raw pointer to the linear memory). Perhaps with some unsafe glue it'd be possible?

Anyway, I'm extremely curious whether there are any plans for Wasmer to expand in this direction, and whether I should try and dabble in this myself, or wait for the project to mature some more. Thanks!

@ElusiveMori ElusiveMori added the ❓ question I've a question! label Mar 30, 2019
@ElusiveMori
Copy link
Author

I did some extra research, in particular on how wasm-bindgen does this, and their approach is, indeed, to allocate some data within the wasm linear memory, and then copy the data into it from the host side.

From https://rustwasm.github.io/docs/wasm-bindgen/contributing/design/exporting-rust.html :

function passStringToWasm(arg) {
  const buf = new TextEncoder('utf-8').encode(arg);
  const len = buf.length;
  const ptr = wasm.__wbindgen_malloc(len);
  let array = new Uint8Array(wasm.memory.buffer);
  array.set(buf, ptr);
  return [ptr, len];
}

This means that for this we'd need some kind of mutable access to the a module's instance's linear memory.

@lachlansneff
Copy link
Contributor

lachlansneff commented Apr 2, 2019

@Samuelmoriarty Wasmer does let you mutably access the host memory. The view api on Memory, returns a type that derefs to &[Cell<T>], so you can call set and get on elements of that.

We're also working on better apis for accessing the wasm linear memory, some of which are being tested for implementing the WASI abi.

@bjfish
Copy link
Contributor

bjfish commented Apr 2, 2019

@Samuelmoriarty The direction of host bindings has recently turned to focus on targeting WebIDL as a binding target:
WebAssembly/interface-types#21

wasm-bindgen has a few libraries that take the approach of generating bindings from WebIDL bindings: https://rustwasm.github.io/docs/wasm-bindgen/contributing/web-sys/overview.html?highlight=webidl#buildrs

As this is the direction of the wasm community, Wasmer would target supporting this integration in the future.

The data/data_finalizer fields are a way for wasm hosts to provide a pointer to their own data which is available to host functions.

@ElusiveMori
Copy link
Author

@lachlansneff Ah I see! In the docs it says that the .memory() method on Instance returns an immutable view of the linear memory, so I assumed there was no way to modify it. My bad.

Does the introduction of WASI support imply that we'll get some kind of bindgen mechanism for custom bindings, or am I reading this wrong?

@bjfish Sorry, but I'm not sure I quite understand the implications of using WebIDL for bindgen. As far as I can tell, it is specific to usage in web browsers. Can we still use that infrastructure for non-web usecases?

@bjfish
Copy link
Contributor

bjfish commented Apr 2, 2019

@Samuelmoriarty AFAIK WebIDL is not specific to browsers just like WebAssembly. It is just a interface description language:
https://developer.mozilla.org/en-US/docs/Glossary/WebIDL
This section describes WebIDL outside of the browser:
https://github.com/WebAssembly/host-bindings/blob/ba75b911ae9455dd4e49669b512e77dbebeaaa28/proposals/webidl-bindings/Explainer.md#what-about-bindings-for-non-web-idl-apis

@ElusiveMori
Copy link
Author

@bjfish The way I'm understanding the section you linked seems to imply that it is not meant for non-Web scenarios.

For the case of pre-existing APIs (e.g., Java class libraries or C APIs or RPC IDLs), the APIs' interfaces will likely have very different idioms and types used in signatures, thereby requiring a binding section with very different types and binding operators from the ones presented above.

In contrast, having a separate binding section specification for each separate idiomatic family of APIs would allow each to be ideally suited

Moreover, the WebIDL specification specifies several Web/JS specific types such as object, DOMString, Promise, etc. which don't necessarily even make sense in the context of an e.g. Rust <-> Rust interaction, or, for that matter, any environment where the host is not JS. There doesn't seem to be much interest there to use WebIDL outside of browsers.

@bjfish
Copy link
Contributor

bjfish commented Apr 2, 2019

@Samuelmoriarty I found that section of the documentation a little difficult to understand. So, I ask for clarification, please see the thread here: WebAssembly/interface-types#21 (comment)

Yes, there's nothing that depends here on JavaScript or a browser environment; as long as a host can meaningfully define what Web IDL values mean and how to compile pure-wasm calls to host-calls taking said Web IDL values, you're good.

@ElusiveMori
Copy link
Author

I suppose that answers my questions then. Thanks!

@andrewdavidmackenzie
Copy link

@ElusiveMori did you get anything working for this?
I am (still) using wasmi and asked a similar question there:
wasmi-labs/wasmi#203

BTW: I am using serde to serialize to linear memory and hoped to use serde (rust, compiled to wasm) on the was side to deserialise. This is not very performant as you can imaging. But it is portable across machines and my data structs maybe exchanged between machines and are in serde serialized format already.

It's knowing how to pass the array of bytes across the wasm boundary I'm stuck with, reading the linear memory on wasm side from rust code with wasmi in particular.

If someone has this working or example code in wasmer then I would switch.

Thanks for any pointers.

@MarkMcCaskey
Copy link
Contributor

MarkMcCaskey commented Aug 26, 2019

Hi @andrewdavidmackenzie ,

Yeah, I'm currently writing a blog post about using Wasmer embedded in (Rust) applications; I can probably help you out here.

The answer to the original question is very complicated: This proposal has been renamed from WebIDL -> ⛄️ bindings -> WebAssembly Interface Types. It's very unstable and not generally usable yet, but we'll expose it as an opt-in feature soon. The current best solution is still to manually do it and it generally involves lots of copying. We have an abstraction in wasmer_runtime_core::memory::ptr called WasmPtr which makes dealing with it on the Host side in Rust a bit nicer and much safer.

The copying won't be solved by WebAssembly Interface Types, that can be partially solved with something called reference types (which used to be part of the Garbage Collection proposal but maybe isn't now 🤷‍♂ ).

So the way this looks with Wasmer is something like:

Host side:

use wasmer_runtime_core::{vm::Ctx, memory::ptr::{WasmPtr, Array}};

#[derive(Debug, Serialize, Deserialize)]
pub struct Data {
    pub data: DataBlob,
    pub timestamp: u64,
    pub name_bytes: WasmPtr<u8, Array>,
    pub name_len: u32,
    ...
}

// our trait for saying that it's safe to turn this data to/from little-endian bytes
unsafe impl wasmer_runtime_core::types::ValueType for Data { }

// WasmPtr is just a u32 that does alignment and bounds checking while providing
// a convenient no-copy way to access types from Wasm linear memory
pub fn my_syscall(ctx: &mut Ctx, data: WasmPtr<Data>) -> u32 {
   let memory = ctx.memory(0);
   if let Some(d) = data.deref(memory) {
       println!("Data from {}", d.timestamp);
       io::do_something_with_data(&d.data);
      let name = d.name_bytes.get_utf8_string(memory, d.name_len).unwrap_or("Invalid name");
      println!("Finished processing {}", name);
   } else { return 1; }
  return 0;
}

use wasmer_runtime::{func, imports, instantiate};
fn main() {
   let wasm_bytes = std::fs::read(PLUGIN_LOCATION).expect(&format!(
        "Could not read in WASM plugin at {}",
        PLUGIN_LOCATION
    ));

    let imports = imports! {
        "env" => {
            "my_syscall" => func!(my_syscall),
        },
    };
   
    let mut instance =
        instantiate(&wasm_bytes[..], &imports).expect("failed to instantiate wasm module");

    let entry_point = instance.func::<i32, ()>("entry_point").unwrap();
    let malloc = instance.func::<i32, i32>("malloc").unwrap();

    let pointer = entry_point.call(std::mem::sizeof::<Data>() as u32).expect("failed to get memory");
    let typed_pointer: WasmPtr<Data> = WasmPtr::new(pointer);
    let memory = instance.context().memory(0);

    let mut_ptr = unsafe { typed_pointer.deref_mut(memory) };

    mut_ptr.timestamp = ...;
    ...
    
    entry_point.call(pointer).expect("failed to execute plugin");
}

Guest side:

use common_data::Data;
extern "C" { my_syscall(u32) -> u32; }

fn my_syscall_wrapper(data: &Data) -> Option<()> {
   if unsafe { my_syscall(data as *const Data as usize as u32) } == 0 { Some(()) } else { None } 
}

#[no_mangle]
pub fn malloc(size: u32) -> u32 {
    // hacky way to allocate some bytes
    let vec: Vec<u8> = Vec::with_capacity(size as usize);
    let return_value = vec.as_ptr() as usize as i32;
    std::mem::forget(vec);

    return_value
}

#[no_mangle]
pub fn entry_point(data_ptr: u32) {
   let data = unsafe { data_ptr as usize as *mut Data };
   // process data
  my_syscall_wrapper(&data).unwrap();
}

Fair warning: I wrote most of this from memory, all of it in this GitHub comment box, and didn't test any of it. But that's generally what it would look like. Let me know if that didn't answer your question!

@andrewdavidmackenzie
Copy link

andrewdavidmackenzie commented Aug 27, 2019

I'll read your code in much more detail later, thanks very much for the help.

Meanwhile, I'm hacking around (using wasmi) with something similar in this repo I have evolved from someone elses work for passing strings:

https://github.com/andrewdavidmackenzie/wasmi-string

and I have extended that to passing arbitrarily complex structures using serde serialization and deserialization in https://github.com/andrewdavidmackenzie/wasm_explore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
❓ question I've a question!
Projects
None yet
Development

No branches or pull requests

5 participants