Skip to content
This repository has been archived by the owner on Mar 24, 2022. It is now read-only.

Add a "bounded runtime" mode to guest execution. #612

Merged
merged 2 commits into from
Jan 11, 2021

Conversation

cfallin
Copy link
Member

@cfallin cfallin commented Nov 21, 2020

This modifies the instruction-counting mechanism to track an
instruction-count bound as well; and when a bound is set, the Wasm guest
will make a hostcall to yield back to the host context.

This is all placed under the run_async() function, so bounded-execution
yields manifest as futures that are immediately ready to continue execution.
This should give an executor main loop a chance to do other work at regular
intervals.

lucet-runtime/src/c_api.rs Outdated Show resolved Hide resolved
@cfallin cfallin changed the title WIP: add a "bounded runtime" mode to guest execution. Add a "bounded runtime" mode to guest execution. Nov 23, 2020
@cfallin
Copy link
Member Author

cfallin commented Nov 30, 2020

Just made a few tweaks to (hopefully) get a green CI again; recent changes in wasmtime mean we'll need to fix a few things in Lucet before merging this PR.

  1. The GuestMemory API changed; @pchickey I think the impl of this trait in Lucet needs to be updated as well?
  2. There were some build errors related to hostcall types; @fst-crenshaw this is I think related to your work in Fixes #2418: Enhance wiggle to generate its UserErrorConverstion trait with a function that returns Result<abi_err, String> wasmtime#2419 -- will there be a Lucet update as part of that work too?

I'm happy to do either or both of these if needed, just let me know and I'll dig in further to understand the changes better :-) Thanks!

@iximeow
Copy link
Contributor

iximeow commented Nov 30, 2020

@cfallin i've cooked up a patch for point 1, and am figuring out point 2 at the moment - i expect we can piece those changes out with a wasmtime update independent of this PR landing, so we don't have to worry about tying up too much in one PR

lucet-wiggle/src/lib.rs Outdated Show resolved Hide resolved
@cfallin cfallin force-pushed the cfallin/bounded-execution branch 2 times, most recently from e4e95dc to 4e25235 Compare November 30, 2020 23:53
@cfallin
Copy link
Member Author

cfallin commented Nov 30, 2020

Rebased on top of #614; this should be ready for review now (thanks @iximeow for the quick work on that PR!).

Copy link
Contributor

@acfoltzer acfoltzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for getting this implemented! It's going to make a huge difference for async Lucet embedders.

I have a handful of suggestions here but nothing fundamental.

lucet-runtime/lucet-runtime-internals/src/instance.rs Outdated Show resolved Hide resolved
lucetc/src/compiler.rs Outdated Show resolved Hide resolved
lucetc/src/function.rs Outdated Show resolved Hide resolved
lucetc/src/function.rs Outdated Show resolved Hide resolved
lucetc/src/function.rs Outdated Show resolved Hide resolved
lucetc/src/function.rs Outdated Show resolved Hide resolved
@acfoltzer
Copy link
Contributor

Ah, it occurred to me that we also need a version of Instance::run_start() that has the same bounded time properties. @cfallin would you be able to add this before we merge?

This modifies the instruction-counting mechanism to track an
instruction-count bound as well; and when a bound is set, the Wasm guest
will make a hostcall to yield back to the host context.

This is all placed under the `run_async()` function, so bounded-execution
yields manifest as futures that are immediately ready to continue execution.
This should give an executor main loop a chance to do other work at regular
intervals.
@cfallin
Copy link
Member Author

cfallin commented Jan 11, 2021

Updated, and added the async Instance::run_async_start() as well.

This PR revises the simple instruction-count binding work, which
initially loaded and stored the instruction-count field from memory on
every basic block and emitted an extra conditional branch and yielding
block after every basic block, with two optimizations:

1. The instruction count is kept in a local variable (which the register
   allocator will ideally keep in a register unless spills are
   necessary), and this count is saved back to the field in memory only
   when necessary, e.g., before calls and returns.

2. The checks and conditional branches are inserted only in places where
   necessary to ensure bounded time between checks, at the cost of
   slightly more overrun/approximation in the bound: specifically,
   before calls and returns (to avoid unbounded runtime due to
   recursion) and at loop backedges.
Copy link
Contributor

@acfoltzer acfoltzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fast work on this!

@cfallin cfallin merged commit 3d8df65 into main Jan 11, 2021
@cfallin cfallin deleted the cfallin/bounded-execution branch January 11, 2021 19:47
@benaubin
Copy link
Contributor

Curious - would it be possible to expose these APIs publicly?

@cfallin
Copy link
Member Author

cfallin commented Jan 14, 2021

@benaubin run_async() is a public method on InstanceHandle already -- or did you mean some other part of the API?

@benaubin
Copy link
Contributor

benaubin commented Jan 14, 2021

@cfallin I'd love to have access to the InternalRunResult without using run_async in order to set an approximate CPU usage limit (a limit higher than reasonably expected, mostly to stop a while true {}). We plan to bill on CPU time, and can't do so with the run_async API.

@cfallin
Copy link
Member Author

cfallin commented Jan 14, 2021

@benaubin I see -- the InternalRunResult is an implementation detail (it's how the yield loop in run_async() is made aware of time-bound yields), so I don't think it makes sense to expose that type as-is.

If you are not also using the value-yield functionality in hostcalls, you can implement your use-case with run_async() as-is, I think: manually inspect the returned future and consider the result to be a timeout if it is not ready after the first invocation.

Otherwise, the right thing to do would be to have a separate run_with_timeout() API, I think: this is semantically different than the existing run methods, all of which run to completion (possibly with yields) and so its type needs to be different.

If the latter is what you need, I'm happy to review a PR! Unfortunately I don't have time to work on this myself at the moment though.

@benaubin
Copy link
Contributor

benaubin commented Jan 14, 2021

Happy to do a PR. Renaming InternalRunResult to BoundedRunResultand exposing it would provide a perfect run_with_timeout API as-is.

I'd be happy to build a timeout around CPU time as well, but it would require adding calls to libc for measuring the thread's cpu time - bloat that might not be applicable to all users.

@benaubin
Copy link
Contributor

Oh, I see what you mean! There would need to be a run method that returns BoundedRunResult. I'll add a run_bounded method.

@cfallin
Copy link
Member Author

cfallin commented Jan 15, 2021

@benaubin One other thing that I should probably note also, is that longer-term, the plan is to merge Lucet's functionality into Wasmtime (see blog post from Bytecode Alliance and another blog post on the merging efforts). There are active efforts to make this happen, e.g. async support in bytecodealliance/rfcs#2 and fast instance allocation in bytecodealliance/rfcs#5. In the meantime Lucet is absolutely supported, and as mentioned I'm happy to review a PR; but this may be relevant for your planning :-)

@benaubin
Copy link
Contributor

benaubin commented Jan 15, 2021

Thanks for the heads up! We saw that, but want lucet's Send instances. Our host call surface is purposefully minimal (instances mostly communicate with an external API server) - which should make switching less painful (and sandboxing slightly easier).

@benaubin benaubin mentioned this pull request Jan 15, 2021
@benaubin
Copy link
Contributor

@cfallin I've put an initial draft PR together (#625). Adds a few _bounded methods, and replaces InternalRunResult with an additional Error variant.

@benaubin
Copy link
Contributor

benaubin commented Jan 15, 2021

@cfallin On second thought, you were probably right that the InternalRunResult should remain a private implementation detail. I refactored the run_async_impl into a custom Future implementation, which gives the same benefit as exposing the bounded runtime apis, without complicating the API surface. Could you review PR #626?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants