Is there a performance bottleneck in a multithread environment? #5243

asurdo · 2022-11-10T07:03:56Z

Question:

I use wasmtime in c++ multithread program. I expected performance improvement when I use more threads, but it does not. Is there any bottleneck in wasmtime? And how can I avoid it?

Hear is my performance test log.

My machine environment is 24 core and 64GB memory.

1 thread: speed 8217/s, average time cost 120us
2 thread: speed: 10137/s, average time cost: 196us
10 thread: speed 11190/s, average time cost: 892us

My code logic

singleton for the engine and linker, creating a store on each call, use wasi and some host function.

asurdo · 2022-11-10T07:14:15Z

PS: I compile module only once，and instantiate every time.
I seprate one call to three step, init store, instantiate and call instance func. The step where the time-consuming increases the most is in call instance func.

bjorn3 · 2022-11-10T09:43:21Z

The bottleneck is probably in the linux kernel. As I understand it changing memory mappings will acquire a process wide lock on the memory mappings and once it is done changing them every currently running thread of the process is interrupted to make sure that the cpu core knows about the changed memory mapping. Every time you instantiate a module, memory mappings are changed. This effectively means that the instantiation count per second across the whole process is limited by how fast a single core can handle memory mapping changes and in addition every instantiation pauses execution of other threads for a moment.

bjorn3 · 2022-11-10T09:46:08Z

I believe using the pooling allocation strategy (set using config.allocation_strategy()) is more performant than the default on demand strategy.

asurdo · 2022-11-10T12:33:19Z

I believe using the pooling allocation strategy (set using config.allocation_strategy()) is more performant than the default on demand strategy.

Is config.allocation_strategy() only for Rust? I can't find it in c-API. And in fact, I use jemalloc for my c++ program, malloc is in user mode.

I avoid this problem using multi-process(more memory cost for hundreds of modules), and it worked as I expected.

bjorn3 · 2022-11-10T12:46:57Z

Is config.allocation_strategy() only for Rust?

Looks like it. Seems to be an oversight.

And in fact, I use jemalloc for my c++ program, malloc is in user mode.

When instantiating wasmtime mmap's a (memfd) file containing the initial data of the linear memory of the wasm module. This allows the kernel to lazily load parts and copy only when the data is modified as opposed to having wasmtime write it all at once even if the vast majority isn't used at all or isn't modified. This is a necessity for fast instantiating.

alexcrichton · 2022-11-10T14:42:47Z

I believe that this is largely the same issue as #4637, so I'm going to close in favor of that.

The pooling allocator is known to help here. New *_keep_resident configuration options are also known to help. Very-recent Linux kernels are also known to help. This is an ongoing area of investigation for WebAssembly runtimes and is something we're always interested in improving on.

alexcrichton closed this as completed Nov 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a performance bottleneck in a multithread environment? #5243

Is there a performance bottleneck in a multithread environment? #5243

asurdo commented Nov 10, 2022

asurdo commented Nov 10, 2022

bjorn3 commented Nov 10, 2022

bjorn3 commented Nov 10, 2022

asurdo commented Nov 10, 2022

bjorn3 commented Nov 10, 2022

alexcrichton commented Nov 10, 2022

Is there a performance bottleneck in a multithread environment? #5243

Is there a performance bottleneck in a multithread environment? #5243

Comments

asurdo commented Nov 10, 2022

Question:

Hear is my performance test log.

My code logic

asurdo commented Nov 10, 2022

bjorn3 commented Nov 10, 2022

bjorn3 commented Nov 10, 2022

asurdo commented Nov 10, 2022

bjorn3 commented Nov 10, 2022

alexcrichton commented Nov 10, 2022