Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: command virtualization by argv[0] #39

Open
whitequark opened this issue Jan 10, 2024 · 7 comments
Open

Feature request: command virtualization by argv[0] #39

whitequark opened this issue Jan 10, 2024 · 7 comments

Comments

@whitequark
Copy link
Contributor

WASI-Virt allows me to "bake" resource/data files into an executable, which is very handy for YoWASP. But there is an issue: I have a few cases where I have several small, related executables depend on the exact same set of very large (tens to hundreds of MB) of data files. Obviously I do not want to duplicate them.

This calls for a virtualization mode where several commands are combined into one multi-call binary and dispatched based on argv[0] (or its suffix, to handle the case where you prepend an executaable with a platform name), like Busybox famously does.

@guybedford
Copy link
Collaborator

In theory it should be possible to do data section sharing with component model imports, where a separate binary could just have the data core module. I'm not sure exactly how to do it, or if there are low-level limitations, but that might be an interesting approach to explore.

@whitequark
Copy link
Contributor Author

I think this will automatically happen if I'm using WASI filesystem virtualization combined with argv0 virtualization, since all of the large data will be in the virtualization stub and the individual executables will dispatch to it when they load it. Am I missing something?

@guybedford
Copy link
Collaborator

This would be low-level binary optimization stuff, but I wonder if there's a way for the virtualization stub that has the data segment to somehow import that itself in a way that can be shared with the other binaries that have that same data segment.

@lukewagner may be able to provide some further feedback here, as to whether this is possible or how it might also be achieved.

@whitequark
Copy link
Contributor Author

Oh, I see; so the idea would be to share data on file level (e.g. address the .wasm files by content) rather than to put things into one component?

It is kind of desirable for me to put several binaries into one component anyway as a part of my ongoing effort to do as much stuff as possible within Wasm itself and not in wrappers around it (ideally I want an entire toolchain as a single .wasm file, including internal binaries spawned via popen()).

@guybedford
Copy link
Collaborator

Because of component model composability, anything that works for multiple component files will always be able to be composed into a single component binary file.

So I think the solution space is still the same?

That is, even when having a single binary, you don't want that single file to have multiple versions of the same data internally, so need an approach for sharing it.

Having WASI Virt support multiple command virtualizations at the same time is a really interesting way of approaching it too though, I guess I would need to think through the various usability cases in more detail. The argv[0] splitting feels quite specific, where virtualization isn't currently confined to commands.

@whitequark
Copy link
Contributor Author

Oh yeah, I do want argv[0] virtualization personally for my specific use case but I am by no means insisting that it should be limited to that. It can be a quite generic composition mechanism.

@lukewagner
Copy link

Core modules and components can't import data segments directly (although we should really consider adding this!), but we can fairly easily approximate this ability by wrapping the shared data segment with a thin core module that imports a linear memory and exports functions that call memory.init for the given data segment on the imported memory. Such a core module can then be imported by any number of components (noting that core modules are immutable and thus fine for import by components). I think this technique would allow you to factor out shared static assets from multiple components.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants