Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to access linear memory from rust code that is compiled to wasm #203

Closed
andrewdavidmackenzie opened this issue Aug 25, 2019 · 17 comments

Comments

@andrewdavidmackenzie
Copy link

andrewdavidmackenzie commented Aug 25, 2019

Hi, I'm not sure the best place to ask this type of question, so apologies if this is not it. Just let me know where and I'll ask there.
(this relates to a previous question of mine: #166)

I am writing an app that used a series of "library" functions that are written in rust and compiled to wasm modules (I'll call this the 'rust-wasm' side).

The app (written in rust and compiled native so I'll call it the 'rust-native' side) loads and executes these modules from files using wasmi. This is working just fine.

However, I am attempting to pass some complex structs back and fore over the rust-native/rust-wasm boundary:

  • serializing to linear memory in 'rust-native' side and setting linear memory using wasmi function
  • loading the module, executing the exported function using wasmi
  • the function (running as wasm) reads the lineary memory, deserializes the data and use it (on the 'rust-wasm' side)

does that make sense?

I would like to know if I can use wasmi library functions to read the linear memory on the "wasm side"?

The code would be something like this, but instead of allocating the memory, it would just be getting a reference to the linear memory passed to it.

    let linear_memory = MemoryInstance::alloc(Pages(1), None).unwrap();
    let input_data = linear_memory.get(0, input_data_length as usize).unwrap().as_slice();
    // deserialize data into struct
    // run algorithm on struct 

But remember, this would be compiled to wasm, and be interpreted by wasmi.

Thanks for any help!

@pepyakin
Copy link
Collaborator

This is actually a very common pattern to do!

A compiled wasm module would use linear memory as it's working memory, like RAM. That means, that in order to access linear memory from within a wasm module you just need to do regular loads from memory.

For example, let's say that in rust-native you copy the data at some address data_ptr that has count bytes in linear memory and call function process_data on the rust-wasm side.

#[no_mangle]
extern "C" fn process_data(data_ptr: *const u8, count: usize) {
  use core::slice;
  
  // This is safe only if `data_ptr` is not null.
  let data: Vec<u8> = unsafe { slice::from_raw_parts(data_ptr, count).to_vec() };

  // .. you can use `data` to deserialize in what ever format you want.
}

@andrewdavidmackenzie
Copy link
Author

Thanks, that looks interesting.

I understand that the data_ptr passed in, should be a linear memory created on the rust-native side using wasmi's MemoryInstance::alloc right?

I'm currently struggling to figure out how to get a mut pointer to index 0 of that linear memory to pass as the param.

(this won't work...)

        let linear_memory_ptr = &linear_memory as * mut u8;
        args.push(RuntimeValue::from(linear_memory));

a pointer to any working code doing this stuff already would be great!

@pepyakin
Copy link
Collaborator

I understand that the data_ptr passed in, should be a linear memory created on the rust-native side using wasmi's MemoryInstance::alloc right?

Nope. A linear memory in wasm is just that: an array of bytes indexable by u32. Thus a pointer is just a u32 number that points on the value (or the first item in case of an array).

So the idea is that you somehow allocate (as in, find the address of the first byte of) the vector you wish to pass in getting your data_ptr, using linear_memory.set(data_ptr, &contents) or similar set the contents you wish to transfer into the wasm instance, and then you'd pass the data_ptr wrapped as RuntimeValue::I32 as an argument.

Actually getting data_ptr can be non-trivial. You can't just pick a random address within the linear memory and write your data there, you have to allocate it properly, otherwise you are risking getting very obscure and hard-to-debug issues. Allocating properly really depends on the specifcs of your environment that you are building. I can see two big categories: allocation within the wasm instance (i.e. the allocator code is compiled in) and allocation performed by the host environment (when the host controls the allocation and rust's global allocator is just forwards calls to some malloc/free imported from the host environment).

As to an example, the best I can recommend is the project that I am working on, here. It uses the host allocator approach.

As a nice little trick, you can call linear_memory.grow to grow your linear memory by N pages. You can use the newly allocated pages to pass your data, to do that calculate the starting address of the newly allocated pages and pass it as data_ptr into your wasm code. Obviously, that won't work if you want to allocate multiple times, because there is no way to return pages.

@pepyakin
Copy link
Collaborator

Ah, found this, might be simpler than substrate

@andrewdavidmackenzie
Copy link
Author

andrewdavidmackenzie commented Aug 27, 2019

Thanks!
I see that it is:

  • "getting" the exported linear "memory" from the wasm module (what size and how many pages I don't know - will assume 1 page of 65K?)
  • it's allocating the memory inside that memory on the rust-wasm side ("alloc" function)
  • return a pointer to it back to the rust-native side
  • copying the string (as bytes plus a null termination) into that memory in rust-native side (copying byte by byte!)
  • then calling the exported function with the pointer, so on rust-wasm side it's working on it's own linear memory, getting the offset as a result
  • getting the result as a null terminated string from the offset within the linear memory (copying byte by byte!)
  • deallocating the memory from rust-wasm side (not sure that code in wasm/test.rs actually does anything or is correct!?)

Lots of stuff happening I'm not familiar with, but if this is the recommended approach I'll git it a try.

@pepyakin
Copy link
Collaborator

Yep,

  1. Yeap, the spec defines a wasm page to be 65536 bytes long. You can get the current size by calling the current_size method. To grow the grow method, which returns the number of pages before growing.
  2. Yes!
  3. Yup, which is just a u32.
  4. Actually, it depends on the string. Null terminated strings are common in environments where C is used, but in web we can use standard &str which is essentially a tuple of data pointer and length of the data. Since the length is explicit there is no need for null terminator. But yeah, you have to copy byte-by-byte since wasm code is sandboxed and it doesn't have any means to access the data outside of its linear memory.
  5. Exactly
  6. Genau, byte-by-byte
  7. It depends

Lots of stuff happening I'm not familiar with, but if this is the recommended approach I'll git it a try.

I am not sure if I can call this as generally recommended approach. For instance, If your wasm instances are not retained, i.e. used only for one call and then discarded, then you don't have to do explicit deallocation on the wasm side. Null termination is another concern that depends on what usecases you want to support.

So I will repeat myself once more: everything really depends on the design of your wasm environment. This example I sent you is just that - an example, substrate execution environment is another one. There is another that I know - polkadot validation function. All of them do things a bit differently from each other.

@andrewdavidmackenzie
Copy link
Author

FYI I've cleaned-up that code, refactored a bit and made the wasm module be actually built from the source code in the repo (original had a sha1.wasm that who knows where it came from...) here:
https://github.com/andrewdavidmackenzie/wasmi-string

@andrewdavidmackenzie
Copy link
Author

Thanks for all your help. I'll evolve my copy of wasmi-string to be more generic as a worked example, then try using the technique in my own app.

@pepyakin
Copy link
Collaborator

Be careful with each unsafe though! For example, data_ptr in your case shouldn't be 0 and I am not sure how would the answer fit in if input_length < answer.len()

But otherwise, looks good!

@andrewdavidmackenzie
Copy link
Author

Agreed. It’s full of unwraps(), assumptions and some unsafe code - but works so far, which is a major step forward.

I still don’t really understand how the WASM module rust code is “allocating” that vector, and will have to read up on it. It’s using the linear memory as heap I guess, and rust is generating code to handle that.

In my real app, there will only be one function per module, it will be “pure” and I will ensure answer is smaller than input, so I should be able to make it work with this approach...

@pepyakin
Copy link
Collaborator

If you have any further questions feel free to ask them here!

@andrewdavidmackenzie
Copy link
Author

One thing I am not clear on (continued from original code) is:

  • alloc on rudt-wasm side returns pointer to allocated data (as i32)
  • but then memory.set/memory.get on rust-native side use it as "offset" into linear memory.

Seems to me that could be wrong, and we should calculate it's offset from first byte of linear memory?

WDYT?

@pepyakin
Copy link
Collaborator

alloc on rudt-wasm side returns pointer to allocated data (as i32)

The pointer to allocated data is actually an offset from the first byte of the linear memory. I.e. they are the same

@andrewdavidmackenzie
Copy link
Author

As you might have seen, by avoiding having to use null-terminated strings and working with the String lengths, I was able to remove the byte-by-byte copy on the getting of the response, using MemoryRef.get()

https://github.com/andrewdavidmackenzie/wasmi-string/blob/master/main/src/main.rs#L61

Any idea how to achieve the same with the sending?

It's unclear to me from the docs whether MemoryRef.set() https://docs.rs/wasmi/0.0.0/wasmi/struct.MemoryRef.html uses the size of the source Slice you are setting from... (I guess I'll try it and see!)

@pepyakin
Copy link
Collaborator

I am not sure what you mean: get does byte-by-byte copy.

The link to wasmi that you sent actually refers the first wasmi ever: 0.0.0. I'd recommend using the latest docs : )

@andrewdavidmackenzie
Copy link
Author

I've changed this now and it works.

The previous code had a loop with memory.set_value(), and so my code was doing it byte by byte (as it was originally looking for null termination of string).

Now it's using memory.set() call once for the whole slice.
(but I understand that inside that, it will be doing it one-by-one)

@NikVolf
Copy link
Contributor

NikVolf commented Aug 29, 2019

I've changed this now and it works.

The previous code had a loop with memory.set_value(), and so my code was doing it byte by byte (as it was originally looking for null termination of string).

Now it's using memory.set() call once for the whole slice.
(but I understand that inside that, it will be doing it one-by-one)

really inside it will be memcpy, how exactly it will be compiled is highly platform specific, on modern x86-64 it might be single instruction for the whole array

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants