Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the pooling instance allocator. #2518

Merged
merged 33 commits into from
Mar 8, 2021

Conversation

peterhuene
Copy link
Member

@peterhuene peterhuene commented Dec 17, 2020

This PR implements the pooling instance allocator.

The allocation strategy can be set with Config::with_allocation_strategy.

The pooling strategy uses the pooling instance allocator to preallocate a
contiguous region of memory for instantiating modules that adhere to various
limits.

The intention of the pooling instance allocator is to reserve as much of the
host address space needed for instantiating modules ahead of time.

This PR also implements the uffd feature in Wasmtime that enables
handling page faults in user space; this can help to reduce kernel lock
contention and thus increase throughput when many threads are
continually allocating and deallocating instances.

See the related RFC.

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:wasm wasmtime:api Related to the API of the `wasmtime` crate itself labels Dec 17, 2020
@peterhuene
Copy link
Member Author

peterhuene commented Dec 17, 2020

Hmm, MADV_DONTNEED may not be implemented for aarch64-linux. I'll look into the test failure soon.

@github-actions
Copy link

Subscribe to Label Action

cc @peterhuene

This issue or pull request has been labeled: "cranelift", "cranelift:wasm", "wasmtime:api"

Thus the following users have been cc'd because of the following labels:

  • peterhuene: wasmtime:api

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

@cfallin
Copy link
Member

cfallin commented Dec 17, 2020

Hmm, MADV_DONTNEED may not be implemented for aarch64-linux. I'll look into the test failure soon.

The aarch64 tests run on qemu, which has this lovely bit of implementation in its syscall handling (link):

    case TARGET_NR_madvise:
        /* A straight passthrough may not be safe because qemu sometimes
           turns private file-backed mappings into anonymous mappings.
           This will break MADV_DONTNEED.
           This is a hint, so ignoring and returning success is ok.  */
        return 0;

At some point we'd ideally run our tests on real hardware, but for now I suspect the best option would just be to disable the unit test and anything that depends on the zeroing semantics. (Or maybe a feature flag to eplicitly memset/bzero instead? That's bad if it was only sparsely committed though...)

@alexcrichton
Copy link
Member

We already have a different flag for running in qemu to reduce virtual memory usage overhead since QEMU behaves differently than native processes in that respect, so perhaps that could also be where we configure "hey wasmtime madvise doesn't work as expected"

@peterhuene peterhuene force-pushed the add-allocator branch 4 times, most recently from b6abb64 to fbcb0c6 Compare January 29, 2021 19:54
@peterhuene peterhuene force-pushed the add-allocator branch 4 times, most recently from b55a66a to 404812e Compare February 5, 2021 23:12
@github-actions github-actions bot added the wasmtime:c-api Issues pertaining to the C API. label Feb 5, 2021
@github-actions
Copy link

github-actions bot commented Feb 5, 2021

Subscribe to Label Action

cc @peterhuene

This issue or pull request has been labeled: "wasmtime:c-api"

Thus the following users have been cc'd because of the following labels:

  • peterhuene: wasmtime:c-api

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

@peterhuene peterhuene force-pushed the add-allocator branch 4 times, most recently from b2841d7 to 03118ad Compare February 11, 2021 05:37
This change makes the storage of `Table` more internally consistent.

Elements are stored as raw pointers for both static and dynamic table storage.

Explicitly storing elements as pointers removes assumptions being made by the
pooling allocator in terms of the size and default representation of the
elements.

However, care must be made to properly clone externrefs for table operations.
This commit fails translation of modules that have an segment offset, when
added to the data length, overflows.
@peterhuene peterhuene force-pushed the add-allocator branch 4 times, most recently from f624882 to c1231e7 Compare March 6, 2021 06:01
This commit adds a "pooling" variant to the wast tests that uses the pooling
instance allocation strategy.

This should help with the test coverage of the pooling instance allocator.
This commit extracts out a common pattern of finding a passive element or data
segment into a `find_passive_segment` method.
@peterhuene
Copy link
Member Author

@alexcrichton I think all of the feedback has now been addressed. FYI: the commit you stopped your review at was "fix bad merge".

This is also running the wast tests with the pooling allocator (+uffd on linux).

My TODO list following this PR:

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything here's looking great to me, thanks again for being patient with me :)

As one final follow-up question as well, in the future (not as part of this PR since it's fine as-is), do you think it would be possible to use malloc/free to manage instance/table pools? The memory pool makes sense to use mmap directly since we're doing such funky things with permissions (and it's all page-aligned anyway), and the stack pool makes sense since it's all page aligned too. For instance/tables, however, it seems like there will inevitable be some degree of fragmentation because we round up to page sizes, but I don't think it's strictly necessary (especially now that uffd isn't watching those areas) and we could use malloc/free perhaps?

Otherwise I've got a few more questions/follow-ups below, but otherwise r=me

crates/wasmtime/src/module.rs Show resolved Hide resolved
crates/runtime/src/table.rs Outdated Show resolved Hide resolved
crates/runtime/src/instance.rs Outdated Show resolved Hide resolved
crates/runtime/src/instance/allocator/pooling.rs Outdated Show resolved Hide resolved
crates/runtime/src/instance/allocator/pooling/uffd.rs Outdated Show resolved Hide resolved
crates/runtime/src/instance/allocator/pooling/uffd.rs Outdated Show resolved Hide resolved
* Improve comments.
* Drop old table element *after* updating the table.
* Extract out the same `cfg_if!` to a single constant.
@peterhuene peterhuene force-pushed the add-allocator branch 2 times, most recently from fdc1b09 to 3d05a4f Compare March 8, 2021 17:45
@peterhuene
Copy link
Member Author

Regarding the page alignment for instances and tables: we don't need it for instances for the pooling allocator, but we do need it for tables because the pooling allocator "decommits" table memory by page and you don't want any page to have elements from multiple tables.

The reason instances are paged-aligned is that this is an artifact from when the pooling allocator had one giant mmap'd region and the memories/tables of the instance came next, so they needed page-alignment back then.

I'll fix the instance pool to not page-align the instances and instead align to Instance's alignment requirements in a follow-up PR.

Regarding malloc/free for instances and tables, I think there's no inherent reason why they can't be used, but part of the intention of this design was to preallocate both (a desirable thing to do for a service) and as Instance is a magical DST, managing one's own memory mapping is probably the easiest way to accomplish that.

This commit moves the tracking for faulted guard pages in a linear memory into
`Memory`.
This commit updates the error enums used in instantiation errors to encapsulate
an `anyhow::Error` rather than a string.
@peterhuene peterhuene merged commit f8cc824 into bytecodealliance:main Mar 8, 2021
@peterhuene peterhuene deleted the add-allocator branch March 8, 2021 20:20
cfallin added a commit to cfallin/wasmtime that referenced this pull request Feb 6, 2022
We currently skip some tests when running our qemu-based tests for
aarch64 and s390x. Qemu has broken madvise(MADV_DONTNEED) semantics --
specifically, it just ignores madvise() [1].

We could continue to whack-a-mole the tests whenever we create new
functionality that relies on madvise() semantics, but ideally we'd just
have emulation that properly emulates!

The earlier discussions on the qemu mailing list [2] had a proposed
patch for this, but (i) this patch doesn't seem to apply cleanly anymore
(it's 3.5 years old) and (ii) it's pretty complex due to the need to
handle qemu's ability to emulate differing page sizes on host and guest.

It turns out that we only really need this for CI when host and guest
have the same page size (4KiB), so we *could* just pass the madvise()s
through. I wouldn't expect such a patch to ever land upstream in qemu,
but it satisfies our needs I think. So this PR modifies our CI setup to
patch qemu before building it locally with a little one-off patch.

[1]
bytecodealliance#2518 (comment)

[2]
https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg05416.html
cfallin added a commit to cfallin/wasmtime that referenced this pull request Feb 7, 2022
We currently skip some tests when running our qemu-based tests for
aarch64 and s390x. Qemu has broken madvise(MADV_DONTNEED) semantics --
specifically, it just ignores madvise() [1].

We could continue to whack-a-mole the tests whenever we create new
functionality that relies on madvise() semantics, but ideally we'd just
have emulation that properly emulates!

The earlier discussions on the qemu mailing list [2] had a proposed
patch for this, but (i) this patch doesn't seem to apply cleanly anymore
(it's 3.5 years old) and (ii) it's pretty complex due to the need to
handle qemu's ability to emulate differing page sizes on host and guest.

It turns out that we only really need this for CI when host and guest
have the same page size (4KiB), so we *could* just pass the madvise()s
through. I wouldn't expect such a patch to ever land upstream in qemu,
but it satisfies our needs I think. So this PR modifies our CI setup to
patch qemu before building it locally with a little one-off patch.

[1]
bytecodealliance#2518 (comment)

[2]
https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg05416.html
cfallin added a commit that referenced this pull request Feb 7, 2022
We currently skip some tests when running our qemu-based tests for
aarch64 and s390x. Qemu has broken madvise(MADV_DONTNEED) semantics --
specifically, it just ignores madvise() [1].

We could continue to whack-a-mole the tests whenever we create new
functionality that relies on madvise() semantics, but ideally we'd just
have emulation that properly emulates!

The earlier discussions on the qemu mailing list [2] had a proposed
patch for this, but (i) this patch doesn't seem to apply cleanly anymore
(it's 3.5 years old) and (ii) it's pretty complex due to the need to
handle qemu's ability to emulate differing page sizes on host and guest.

It turns out that we only really need this for CI when host and guest
have the same page size (4KiB), so we *could* just pass the madvise()s
through. I wouldn't expect such a patch to ever land upstream in qemu,
but it satisfies our needs I think. So this PR modifies our CI setup to
patch qemu before building it locally with a little one-off patch.

[1]
#2518 (comment)

[2]
https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg05416.html
mpardesh pushed a commit to avanhatt/wasmtime that referenced this pull request Mar 17, 2022
We currently skip some tests when running our qemu-based tests for
aarch64 and s390x. Qemu has broken madvise(MADV_DONTNEED) semantics --
specifically, it just ignores madvise() [1].

We could continue to whack-a-mole the tests whenever we create new
functionality that relies on madvise() semantics, but ideally we'd just
have emulation that properly emulates!

The earlier discussions on the qemu mailing list [2] had a proposed
patch for this, but (i) this patch doesn't seem to apply cleanly anymore
(it's 3.5 years old) and (ii) it's pretty complex due to the need to
handle qemu's ability to emulate differing page sizes on host and guest.

It turns out that we only really need this for CI when host and guest
have the same page size (4KiB), so we *could* just pass the madvise()s
through. I wouldn't expect such a patch to ever land upstream in qemu,
but it satisfies our needs I think. So this PR modifies our CI setup to
patch qemu before building it locally with a little one-off patch.

[1]
bytecodealliance#2518 (comment)

[2]
https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg05416.html
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:wasm cranelift Issues related to the Cranelift code generator wasmtime:api Related to the API of the `wasmtime` crate itself wasmtime:c-api Issues pertaining to the C API. wasmtime:docs Issues related to Wasmtime's documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants