Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a performance bottleneck in a multithread environment? #5243

Closed
asurdo opened this issue Nov 10, 2022 · 6 comments
Closed

Is there a performance bottleneck in a multithread environment? #5243

asurdo opened this issue Nov 10, 2022 · 6 comments

Comments

@asurdo
Copy link

asurdo commented Nov 10, 2022

Question:

I use wasmtime in c++ multithread program. I expected performance improvement when I use more threads, but it does not. Is there any bottleneck in wasmtime? And how can I avoid it?

Hear is my performance test log.

My machine environment is 24 core and 64GB memory.

1 thread: speed 8217/s, average time cost 120us
2 thread: speed: 10137/s, average time cost: 196us
10 thread: speed 11190/s, average time cost: 892us

My code logic

singleton for the engine and linker, creating a store on each call, use wasi and some host function.

@asurdo
Copy link
Author

asurdo commented Nov 10, 2022

PS: I compile module only once,and instantiate every time.
I seprate one call to three step, init store, instantiate and call instance func. The step where the time-consuming increases the most is in call instance func.

@bjorn3
Copy link
Contributor

bjorn3 commented Nov 10, 2022

The bottleneck is probably in the linux kernel. As I understand it changing memory mappings will acquire a process wide lock on the memory mappings and once it is done changing them every currently running thread of the process is interrupted to make sure that the cpu core knows about the changed memory mapping. Every time you instantiate a module, memory mappings are changed. This effectively means that the instantiation count per second across the whole process is limited by how fast a single core can handle memory mapping changes and in addition every instantiation pauses execution of other threads for a moment.

@bjorn3
Copy link
Contributor

bjorn3 commented Nov 10, 2022

I believe using the pooling allocation strategy (set using config.allocation_strategy()) is more performant than the default on demand strategy.

@asurdo
Copy link
Author

asurdo commented Nov 10, 2022

I believe using the pooling allocation strategy (set using config.allocation_strategy()) is more performant than the default on demand strategy.

Is config.allocation_strategy() only for Rust? I can't find it in c-API. And in fact, I use jemalloc for my c++ program, malloc is in user mode.

I avoid this problem using multi-process(more memory cost for hundreds of modules), and it worked as I expected.

@bjorn3
Copy link
Contributor

bjorn3 commented Nov 10, 2022

Is config.allocation_strategy() only for Rust?

Looks like it. Seems to be an oversight.

And in fact, I use jemalloc for my c++ program, malloc is in user mode.

When instantiating wasmtime mmap's a (memfd) file containing the initial data of the linear memory of the wasm module. This allows the kernel to lazily load parts and copy only when the data is modified as opposed to having wasmtime write it all at once even if the vast majority isn't used at all or isn't modified. This is a necessity for fast instantiating.

@alexcrichton
Copy link
Member

I believe that this is largely the same issue as #4637, so I'm going to close in favor of that.

The pooling allocator is known to help here. New *_keep_resident configuration options are also known to help. Very-recent Linux kernels are also known to help. This is an ongoing area of investigation for WebAssembly runtimes and is something we're always interested in improving on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants