Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance bottleneck when creating Instances / growing memory in parallel #2563

Closed
jcaesar opened this issue Sep 8, 2021 · 3 comments
Closed
Labels
🎉 enhancement New feature! priority-medium Medium priority issue 🏚 stale Inactive issues or PR
Milestone

Comments

@jcaesar
Copy link
Contributor

jcaesar commented Sep 8, 2021

Motivation

I'd like to invoke user-defined functionality in a stream data processing setting. To ensure that there are no odd influences between subsequent (but usually unrelated) stream events, I'd like to create a new Instance for every event I process.
Also, I'd like to process loads of messages, possibly millions per second (in a cluster).

So I need lots of instances per second.

Problem

Wasmer instance creation doesn't benefit from parallelization. 1 thread gives 111 k instances per second, and 2 threads 80 k/s.
More threads make the situation even worse.
(The numbers are from a Ryzen 7 3700X 8-Core, Linux 5.13.13-arch1-1. My production machines will be larger…)

After a bit of benchmarking and experimenting, I arrived at the conclusion that the instance memory must be at fault.
multithread flamegraph
If an instance has accessible memory, it will make a call to mmap/mprotect when being created (and the corresponding munmap call on drop) (look for the __GI_m* calls in the flamegraph, right below the stacks of [unknown]). The following experiment proves that they're at fault.

Proposed solution?

I have experimented with avoiding the mmap calls by reusing wasmer_vm::Mmaps. This can be done without modifying wasmer (but with some code duplication) by

This has the desired effect of letting instance creation scale near-linearly to the number of cores. (e.g. 1 thread: 115 k/s, 2 threads: 233 k/s).

The problem is that to avoid once instance being able to see memory left by another, Mmaps need to be zeroed out up to the accessible size. Which is only cheaper than mmaping up to 10 pages (for the single-threaded case).

So, I'm looking for alternatives.

Alternatives

At a single thread, even with one mprotect/mmap/munmap for each instance memory allocation, I can get about 100k Instances per second. It's not awesome, but probably enough for most of my use-cases. I can just have a single thread create all the instances and shove them through an mpmc. (If in doubt, I can use more smaller machines. Cloud and all.)

I'm also considering the possibility of munmapping chunks of memory at the beginning of a Mmap that have already been used by an instance. It would free the memory and save zeroing or mmapping, but I'm not sure whether it's much cheaper, since munmap still messes with the TLB(?).

@jcaesar jcaesar added the 🎉 enhancement New feature! label Sep 8, 2021
@jcaesar jcaesar changed the title Performance bottleneck when creating Instances / growing memory Performance bottleneck when creating Instances / growing memory in parallel Sep 10, 2021
@Amanieu Amanieu added the priority-medium Medium priority issue label Oct 20, 2021
@epilys epilys modified the milestones: v3.0, v3.x Apr 27, 2022
@jcaesar
Copy link
Contributor Author

jcaesar commented Dec 2, 2022

For context: wasmtime seems to have given up on this problem: bytecodealliance/wasmtime#4637 (comment)

@ptitSeb ptitSeb modified the milestones: v3.x, v4.x May 3, 2023
Copy link

stale bot commented May 3, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the 🏚 stale Inactive issues or PR label May 3, 2024
Copy link

stale bot commented Jun 5, 2024

Feel free to reopen the issue if it has been closed by mistake.

@stale stale bot closed this as completed Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🎉 enhancement New feature! priority-medium Medium priority issue 🏚 stale Inactive issues or PR
Projects
Development

No branches or pull requests

4 participants