Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose ModuleExport to avoid repeated string lookups #7828

Merged
merged 2 commits into from
Jan 28, 2024

Conversation

grovesNL
Copy link
Contributor

From a Zulip discussion a few months ago (https://bytecodealliance.zulipchat.com/#narrow/stream/217126-wasmtime/topic/Custom.20hashers) it was mentioned that custom hashers like rustc-hash should be ok for most usage of hashmaps/indexmaps because they don't need DoS protection.

exports showed up in a recent profile (e.g., when _get_export does instance.module().exports.get_full(name)?;) so I thought it might be good to start by adding rustc-hash here. Because this is used in get_export, this optimization should be useful generally.

@grovesNL grovesNL requested review from a team as code owners January 26, 2024 16:10
@grovesNL grovesNL requested review from alexcrichton and removed request for a team January 26, 2024 16:10
@alexcrichton
Copy link
Member

Thanks for this! This is a bit of an interesting question to me though because Wasmtime is in a different position than rustc and is exposed to untrusted modules and this could in theory be an issue for embedders. That being said I think it'd still be good to solve this. One idea would be to expose the underlying indices that Wasmtime uses. For example instead of looking up via name on instances each time you could look up the name on a Module once, get an index, and then use that index with instantiations of all future modules. That should in theory be even faster than a rustc-hash based approach.

Would a scheme like that work for your use case? If so would you be interested in adapting this PR to such an approach?

@grovesNL
Copy link
Contributor Author

Yeah that makes sense for embedders. In my case I'm both the producer and consumer of the module (e.g., I'm writing the export names during encoding, and then immediately looking them up from the instance) so DoS isn't important. It would be useful to override the hasher throughout most types in cases like this but I can appreciate not wanting to complicate type signatures too.

That scheme sounds ok, but I'm not sure how to expose the index we'd need without mapping bidirectionally between EntityIndex and usize somewhere. For example, exports is IndexMap<String, EntityIndex> so it doesn't seem like we have a great way to expose either EntityIndex or usize (the actual index for the map):

  • If we expose EntityIndex (the value for the index map) on the original wasmtime::Module, then we don't have a clear way to convert back from an EntityIndex to the usize needed to look it up on the instance later on. e.g., the EntityIndex to usize mapping isn't preserved during translate_payload so we can't recover it afterwards except by iterating all exports to find it.

  • If we expose usize, then we'd still have to do some kind of name to usize lookup after we have the instance, but that would have the same overhead as the original get_export name lookup.

Ideally I could get the EntityIndex from the original wasmtime::Module or query the usize by name from InstancePre somehow (so I could share it across all instances created from it).

@alexcrichton
Copy link
Member

Oh the raw value here that get_export is interested in is the ExportIndex. There's no need to recomputed the usize index of the name in the export map. For that either returning ExportIndex raw or perhaps a wrapper around it should be sufficient (along with a new method like get_export that takes this as an argument)

@grovesNL
Copy link
Contributor Author

If we add a function like this to wasmtime::Module:

pub fn get_export_index(&self, name: &str) -> Option<EntityIndex> {
    let module = self.compiled_module().module();
    module.exports.get(name).copied()
}

Then on Instance we might want want something like:

pub fn get_export_by_index(&self, mut store: impl AsContextMut, index: EntityIndex) -> Option<Extern> {
    // ... same boilerplate as `get_export` ...
    let item = unsafe { Extern::from_wasmtime_export(instance.get_export_by_index(index), store) };
    let data = &mut store[self.0];
    data.exports[i] = Some(item.clone());
    Some(item)
}

The part I'm not sure about is setting the lazily populated data.exports[i] with the export before we return it. We don't seem to be able to determine the index (i) from ExportIndex without searching the exports by value, because the exports indexmap is keyed by the export name, not its EntityIndex.

@alexcrichton
Copy link
Member

Aha I apologize, I had forgotten about that. That was why the usize you were talking about is important!

How about this:

impl Module {
    fn get_export(&self, name: &str) -> Option<ModuleExport>;
}

#[derive(Copy, Clone)]
pub struct ModuleExport {
    module: CompiledModuleId, // make sure this isn't mixed up with different modules
    entity: EntityIndex, // raw index into the wasm module
    export_name_index: usize, // the index within the `IndexMap`
}

impl Instance {
    pub fn get_module_export(&self, ..., export: &ModuleExport) -> Option<Export>;
}

(please bikeshed the names)

Basically package everything up in a new struct in the wasmtime crate

@alexcrichton
Copy link
Member

I suppose stepping back a little bit for a moment, one thing I would say is that in Wasmtime we've tried to basically avoid hash maps wherever we can. Where possible we use indexes into lists rather than strings to entries. While hash maps are necessary in the API somewhere it's often possible to lift the hash lookups outside of loops. Or at least that's what's worked out for us so far, not that we can't make the hash maps faster here just that we'd prefer to explore other avenues first (and also because even a faster hash map is probably much slower than a simple bounds check)

@grovesNL
Copy link
Contributor Author

grovesNL commented Jan 27, 2024

Thanks for clarifying! I updated the PR to match the scheme you described.

Is the purpose of including CompiledModuleId on ModuleExport just out of convenience for user code, or should we compare these somewhere during get_module_export?

@grovesNL grovesNL force-pushed the rustc-hash branch 2 times, most recently from 461b18b to fd3cb7a Compare January 27, 2024 02:16
@grovesNL grovesNL changed the title Use rustc-hash for module exports Expose ModuleExport to avoid repeated string lookups Jan 27, 2024
@github-actions github-actions bot added the wasmtime:api Related to the API of the `wasmtime` crate itself label Jan 27, 2024
Copy link

Subscribe to Label Action

cc @peterhuene

This issue or pull request has been labeled: "wasmtime:api"

Thus the following users have been cc'd because of the following labels:

  • peterhuene: wasmtime:api

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great thanks! Just a few minor comments but otherwise looks good to go.

Is the purpose of including CompiledModuleId on ModuleExport just out of convenience for user code, or should we compare these somewhere during get_module_export?

I commented below as well, but we'll want to check this when looking up exports to prevent this mistake of using the wrong module's exports to index an instance.

crates/wasmtime/src/instance.rs Show resolved Hide resolved
crates/wasmtime/src/module.rs Outdated Show resolved Hide resolved
@grovesNL
Copy link
Contributor Author

Thank you for the feedback! I updated the PR. I moved the lookup into the shared _get_export function and pass the entity and index to that instead (checking the module ID is only necessary for the new function).

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@alexcrichton alexcrichton added this pull request to the merge queue Jan 28, 2024
Merged via the queue into bytecodealliance:main with commit 04adffd Jan 28, 2024
18 checks passed
@grovesNL grovesNL deleted the rustc-hash branch January 28, 2024 03:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wasmtime:api Related to the API of the `wasmtime` crate itself
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants