Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Meta] Plugin system for 2.0 #2635

Closed
kdy1 opened this issue Nov 3, 2021 · 29 comments
Closed

[Meta] Plugin system for 2.0 #2635

kdy1 opened this issue Nov 3, 2021 · 29 comments

Comments

@kdy1
Copy link
Member

kdy1 commented Nov 3, 2021

I will modify this as required.

Javascript plugins

If it's possible to make it fast, I'll support js plugins, too.

Rust-based plugins

Tasks

Task: Proxy swc_common (#2646)

  • swc_common: Create a trait for thread-local APIs. (cargo feature plugin-base)
  • swc_common: Implement the trait. (cargo feature plugin-rt)
  • swc_common: Replace APIs in plugin mode. (cargo feature plugin-mode)
  • Pass the trait to plugins and use it to replace swc_common APIs.

Task: rplugin: System to pass AST directly (#2671)

This is important for performance.

Task: Share more input, like SourceMap or SourceFile.

Task: Actaully use rplugin.

Task: Improve plugin runner

Task: Add plugin feature to swc

Task: Expose node apis

Design

I experimented lots of ways for plugin system, and decided to go with abi_stable. But there's a problem.

Problem: Identical API

I want to provide identical API as core transforms. And it requires sharing data stored in thread-local variables.

Triage: Dynamic library

I initially thought I need to make swc_common a dynamic library so plugins can invoke it. But dynamic linking does not fit for swc plugins. As swc_common depends on std, the version of rustc should be all same among plugins, and chaning version of rustc becomes a huge breaking change. I don't want to break all plugins by upgrading rustc. So wee need another way than a dynamic library.


I posted some questions on the rust community while trying this option, in case you are curious about what I tried.

I also experimented with extern "C". See https://github.com/kdy1/rust-dylib-test

Solution: Pass function to plugin with stable abi

Instead of making swc_common dynamic library, we can pass a big trait that contains functions related to thread-local variables in swc_common.

The plugin runner will pass this trait while invoking plugins. abi_stable can be used to do this.

For plugins, operations of swc_common will be replaced by the trait. In other words, functions in swc_common will work by calling a method in the passed trait object.


@kdy1 kdy1 added this to the v2.0 milestone Nov 3, 2021
@kdy1 kdy1 pinned this issue Nov 3, 2021
@kdy1
Copy link
Member Author

kdy1 commented Nov 8, 2021

Related PR: #2671


I made a crate to pass AST nodes as a special form to the plugin and this is the result I got.

Due to limitation of testing system,ast_clone_to_stable performsclone + to_stable andast_clone_to_stable_then_to_unstable performsclone + to_stable + to_unstable.
Time used for the clone is benchmarked by ast_clone, so you can subtract it.

I also profiled the performance of passing data to plugins using JSON

image

.

Note that in the real-world, clone is not required.

Deleted

I forgot to configure the memory allocator. So I updated the image above.

@devongovett
Copy link
Contributor

Excited to see progress here! I think there's still one potential issue with this approach. Adding/updating/removing fields from AST nodes is potentially a breaking change when types are nested without boxes. For example:

struct Foo {
  foo: u32,
  bar: Bar,
  baz: u32
}

struct Bar {
  test: u32
}

Now say you add another field to the Bar struct. On its own this is ok if you add it at the end, but this changes the memory layout of Foo as well. Specifically, the fields after bar would be shifted. So any plugins compiled against an older version of SWC won't work anymore.

To avoid this you'd have to box every single nested struct (which might not be great for perf), and also make sure that you only ever add fields at the end.

Another way could be to make accessor functions and have plugins only use those instead of accessing fields directly. Eg get_bar(foo) instead of foo.bar. That would leave the implementation up to SWC core and allow older plugins to still work with a newer SWC (assuming fields aren't removed). Not sure the perf impact of doing that though, and you'd want to do some codegen to make this not super tedious for plugin devs to use.

Or maybe you just decide that AST changes can only be done in major versions. Not sure how often they occur.

@kdy1
Copy link
Member Author

kdy1 commented Nov 8, 2021

@devongovett Thank you for the feedback!

Actually abi_stable supports adding fields without breaking layout.

And It also supports making enums non-exhaustive so we can add variants without breaking plugins.

I'll make all fields of plugin ast private so adding fields is not a breaking change for rust, too.


Problem is that the current design makes 'unknown' variants and fields dropped by the plugin.

Thoughts:

  • We can force people to use plugin ast for their transforms so it never breaks, but I'm not sure about this at the moment.

  • Most of ast changes are typescript extensions. As plugins are invoked after stripping types out, so excluding AST nodes related to typescript from plugin ast is also possible.

  • (Not sure) AST changes comes with new transforms to remove the added AST nodes.
    We might detect the version of plugin, and invoke the plugin after removing them. But I think this will make the code much more compilcated.

How do you think about these ideas?

@devongovett
Copy link
Contributor

Ah interesting, I missed that. Looking at the docs it seems all field accesses end up becoming methods (returning Option if they were added after the last major), which is close to what I was suggesting.

We can force people to use plugin ast for their transforms so it never breaks, but I'm not sure about this at the moment.

What are the downsides of this? It seems like this would avoid some extra AST conversions as well, so maybe better perf too? I guess the difference would be that the API is slightly different for plugins vs in SWC core, but I think this is ok (and maybe expected?). I would think you would want to expose a more limited API to plugins anyway to avoid maintenance problems. Babel had this problem - they exposed everything in core to plugins, and were stuck not being able to make changes because plugins started depending on too many internals. I would recommend being quite intentional about what is exposed to plugins. Limit it at first and expand over time as things get figured out and requested by plugin authors.

Most of ast changes are typescript extensions. As plugins are invoked after stripping types out, so excluding AST nodes related to typescript from plugin ast is also possible.

I could imagine plugins wanting to run before stripping TS as well, e.g. a documentation generator, or code mod system.

We might detect the version of plugin, and invoke the plugin after removing them. But I think this will make the code much more compilcated.

Yeah that sounds hard.

@kdy1
Copy link
Member Author

kdy1 commented Nov 9, 2021

Ah interesting, I missed that. Looking at the docs it seems all field accesses end up becoming methods (returning Option if they were added after the last major), which is close to what I was suggesting.

Oh, I see. Yeah, something similar to method will be required to handle such cases. (Logically, didn't think about how does it work)

What are the downsides of this? It seems like this would avoid some extra AST conversions as well, so maybe better perf too? I guess the difference would be that the API is slightly different for plugins vs in SWC core, but I think this is ok (and maybe expected?). I would think you would want to expose a more limited API to plugins anyway to avoid maintenance problems. Babel had this problem - they exposed everything in core to plugins, and were stuck not being able to make changes because plugins started depending on too many internals. I would recommend being quite intentional about what is exposed to plugins. Limit it at first and expand over time as things get figured out and requested by plugin authors.

Exactly. It will avoid lots of conversions. I was simply trying to expose identical API as core transforms, but I think you are right. The problem of babel is quite interesting. Thank you so much for sharing it.


I concluded that exposing plugin ast to plugins is better than exposing normal ast to the plugin, thanks to your comment. I'll implement a visitor generator for plugin ast.

@kdy1
Copy link
Member Author

kdy1 commented Nov 9, 2021

I found that prefix type requires using _Ref types everywhere, because of the layout issue you mentioned.
I think this can make implementing plugin harder, so I'll investigate.

@kdy1
Copy link
Member Author

kdy1 commented Nov 9, 2021

One downside of using plugin ast is split of the libraries - normal ast vs plugin ast

@dwoznicki
Copy link
Contributor

Which libraries would use the normal ast? Anything outside of the core? Also, are you planning for the plugin ast to be a perfect subset of the normal ast, or will it have to be a little different?

@kdy1
Copy link
Member Author

kdy1 commented Nov 10, 2021

normal ast is ast used by core.
I'm not sure about the plugin ast at the moment.

@dwoznicki
Copy link
Contributor

If the two ASTs are similar enough, perhaps you could make a single swc-ast crate and use #[cfg] to create normal and plugin compilation targets.

@kdy1
Copy link
Member Author

kdy1 commented Nov 11, 2021

@dwoznicki Sadly it's not the case, and as feature flag is global, it will break lots of code.

@kdy1 kdy1 unpinned this issue Nov 22, 2021
@kdy1 kdy1 removed this from the v2.0 milestone Nov 27, 2021
@siyi98
Copy link

siyi98 commented Dec 3, 2021

@kdy1 Will Plugin 2.0 support the lifecycle of Babel's pre and post?

@kdy1
Copy link
Member Author

kdy1 commented Dec 31, 2021

I did some more investigation.

I switched wasm runtime to wasmer, and checked if caching compiled bytecode can be a solution to perf issue.

Time for loading + 100 invocation:
(These invocations are done in a single process)


dylib: 287.170921ms
First run of wasm: 1.988753484s
Other runs of wasm: 417.329559ms

plugin system based on a dynamic library is faster, but it's really a headache to publish the plugin to all os, arch and abis.

Additional notes

  • Cached bytecode is target-specific and can be prebuilt

But this will not be officially supported, at least for now.

  • Post-install script will be used to compile wasm to machine-specific bytecode.

  • We may detect llvm in the PATH and optimize the wasm to make it even faster.

This is just possible, and not going to be implemented in the initial version of v2.

  • We can force precompilation of wasm files to remove initial compile-time and reduce the binary size of @swc/core. (In future)
Current:
    28117216 bytes (= 26.8 MB)

wasmer + codegen:
    32716920 bytes (= 31.2MB)

wasmer + without codegen:
    28612496 bytes (27.3 MB)

@kwonoj
Copy link
Member

kwonoj commented Dec 31, 2021

TL:DR, it is a major shift but we'd like to propose plugin interface based on wasm instead of native dylib binaries.

There are notable wins as shared above if we can accept some of costs of cold startup time. Still SWC will need to care about breaking changes around AST itself but abi becomes lot stable, also it doesn't require to build several native binaries per each platform.

As same as current codesbases, this plugin api changes will live under experimantal flag. We'd like to get some real world feedback until we stablize interface. Also meanwhile core team need to prepare some guidances such as template-repo or bootstrap codes to write plugins, etcs.

@devongovett
Copy link
Contributor

A huge downside of WASM is that there is no memory sharing so you have to serialize and copy everything into and out of WASM's memory. That will be the biggest performance bottleneck of this approach I think. Probably not much better than writing plugins in JS. In your comparison you are doing this in both wasm and dylib versions, but if your dylib didn't serialize/deserialize and instead used shared memory it would likely be much faster.

@kdy1
Copy link
Member Author

kdy1 commented Dec 31, 2021

Actually, dylib version also serializes and deserializes.
https://github.com/kdy1/wasm-perf

But you are right, most of the time is used by serde, both for wasm and dylib-based plugin.

@devongovett
Copy link
Contributor

Actually, dylib version also serializes and deserializes.

Yeah that's what I meant.

@kwonoj
Copy link
Member

kwonoj commented Dec 31, 2021

Unfortunate thing in here is we really don't have a satisfactory solution to solve all the problems we have. We either have to suffer with building native binaries per platform or bite (some) of the costs by allowing portable formats like wasm.

@kdy1 did some quick measurement with current serialization approaches and numbers are indeed not something we can easily accept. However, so far it looks like the major cost comes from serialization itself, not for the part of copying bytes across host-to-guest memory spaces. If it's true, and if somehow acceptably faster serialization approach can be adapted we may be able to pursue wasm based plugin approach. I am aware these are based on a lot of if and I do not think we can conclude at this point though.

The reason I'm in favor of wasm-based approach is it definitely lowers barrier to the ecosystem by reducing some of pain points. Writing non-js-plugins is already a barrier from the beginning, but logistics around building several native binaries & publishing add a lot more complex problems on top of that. Wasm at least consolidates build targets as well as allowing several languages other than rust.

@devongovett
Copy link
Contributor

Sure. I think it would be pretty simple to allow both though. The API would be identical between native and WASM for plugin authors, and it seems like it would be pretty easy to support loading both dylibs and WASM within SWC. That gives plugin authors the choice.

@kwonoj
Copy link
Member

kwonoj commented Dec 31, 2021

I would like to avoid to open up SWC attempts to support both due to we don't have a single solution satisfies all. I mean we may eventually land there, but that means SWC should support both ways to work gracefully for the authors and consumers of the plugin or otherwise it'll be hard ecosystem fragments for the users. Opening up both interfaces gives flexibility to the author but users cannot choose which type of plugin to use unless there are identical plugins written in both ways, and in the worst cases, it can cause both problems like some plugins written in wasm degrades perf and some other native plugins exposes issues around native binaries like not having proper prebuilt binaries for the platform.

For now I'd like to push forward one approach first and see what kind of blockers we'll see. Decisions can be made based on the results.

@kdy1
Copy link
Member Author

kdy1 commented Jan 2, 2022

I think we can support wasm plugin without regressing performance.
I optimized serde itself to solve the perf issue, but I found a better way 🤣

I used rkyv for serialization and deserialization.

test ast_clone                            ... bench:      14,206 ns/iter (+/- 3,236)
test ast_clone_to_stable                  ... bench:      44,502 ns/iter (+/- 1,829)
test ast_clone_to_stable_then_to_unstable ... bench:      79,108 ns/iter (+/- 9,551)
test json_deserialize                     ... bench:     356,454 ns/iter (+/- 15,117)
test json_serialize                       ... bench:      70,993 ns/iter (+/- 10,515)
test rkyv_deserialize                     ... bench:      25,982 ns/iter (+/- 4,788)
test rkyv_serialize                       ... bench:      19,425 ns/iter (+/- 2,026)
  • 79,108ns - 14,206ns is the time used for converting AST into a stable form, which can be passed to dylib via abi_stable. So it's the overhead of the plugin system based on abi_stable.

  • 19,425ns + 25,982ns is the overhead of the plugin system based on rkyv.

This does not solve the AST compatibility issue and the compatibility issue requires more investigation.

@syrusakbary
Copy link

I just saw the integration of Wasmer into SWC... it's awesome!

PS: one of the advantages of using Wasmer is that also works fully on the browser via wasm-bindgen, so it can run properly on the SWC playground. Not sure if you have already tried it, but we can help on make it work!

We are building some services that will help on automatically cross-compiling the wasm modules on the cloud.
This way, in the future SWC could ship a headless version of wasmer (without any compilers attached), fetch the optimized precompiled wasm module from the cloud (in the Post-install script, for example using the LLVM cloud-compiled version), and run it at ultra-fast speeds locally without needing to have LLVM installed locally.

Would that be interesting for SWC?

@kdy1
Copy link
Member Author

kdy1 commented Jan 5, 2022

Yes, and I want it now actually :(
cranelift-codegen breaks deployment because it uses too much thread-local storage space lol.

Seriously, we were considering something similar, although it is a simple subcommand and uses the CI system the author choose. If this process can be easier, I'd love to see some integrations.

@kdy1
Copy link
Member Author

kdy1 commented Jan 5, 2022

@syrusakbary One more thing. Plugin authors will typically be users, but not customers. Is it fine?
Curious because I'm thinking about ways to make open-source more sustainable.

@syrusakbary
Copy link

Plugin authors will typically be users, but not customers. Is it fine?

Yeah, that is completely fine.
For the initial prototype, if we aim to have the plugins published in wapm we could automatically pre-compile them to the desired architectures upon publishing (we have a Github action to make publishing trivial in CI), and we could create a "swc" plugins page in the website (WAPM API is also completely open and available through a public GraphQL endpoint, so there can be a custom frontend, for example, in the swc.rs website or in the swc CLI).

Do you think that would work for your use case @kdy1 ?

@kwonoj
Copy link
Member

kwonoj commented Jan 5, 2022

Not kdy1 (😅 ) but it sounds reasonably workflow.

I have one question regarding on those - How does it affects for the local / CI dev for the consumer point of view? Originally, our assumption is simply let each plugin is being published into npm registry and it can be listed under npm's package management for the dependency management. Peeking wapm bit, it looks like it needs separate pkg mgr for wapm registry (or either install wasmer)?

@kdy1
Copy link
Member Author

kdy1 commented Jan 6, 2022

I'm not sure about wapm at the moment, but it sounds reasonably simple so I think it will work.

@kwonoj
Copy link
Member

kwonoj commented Jan 21, 2022

@kdy1 I think it's ok to close this issue as general circumstances for the issue has been changed and superseded by #3167 . I did close it by myself, however if I understood something wrong or miss anything please reopen again.

@swc-bot
Copy link
Collaborator

swc-bot commented Oct 19, 2022

This closed issue has been automatically locked because it had no new activity for a month. If you are running into a similar issue, please create a new issue with the steps to reproduce. Thank you.

@swc-project swc-project locked as resolved and limited conversation to collaborators Oct 19, 2022
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

7 participants