Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue: Worker support #13143

Closed
addaleax opened this issue May 21, 2017 · 35 comments
Closed

Tracking issue: Worker support #13143

addaleax opened this issue May 21, 2017 · 35 comments
Labels
feature request Issues that request new features to be added to Node.js. worker Issues and PRs related to Worker support.

Comments

@addaleax
Copy link
Member

I’m opening this as a discussion issue, as proposed in #2133 (comment), to see whether and how we can get (Web)Worker support into Node core.

This feature would be comparatively large, there’s quite a bit of pre-existing work that we will want to look at, and I imagine we might want to do an enhancement proposal (EP) first to make sure we don’t engage in too much unnecessary work. I imagine the roadmap is something like this:

  1. Create a new repository in the org to work under, for code, working out the EP text, discussion issues, and so on. This is something I can just do unless anybody objects to that in the near future.
  2. Find volunteers who are willing to do parts of the work, maybe even set up somewhat regular meetings.
  3. See if there are any immediate objections to a feature of this kind becoming a part of Node core.
  4. Figure out what the API should look like; How close to the WebWorker standard could and should we get? How much of Node’s standard modules should be available to workers (e.g.: we’d probably want timers but no I/O modules? or maybe we would want those?)? SharedArrayBuffers? Other magic?
  5. Take a look at work that has been done elsewhere, e.g. workers: initial implementation #2133 or @audreyt’s https://github.com/audreyt/node-webworker-threads, and see how much can be used, what went well/not so well, etc.; Also, I know @bnoordhuis has looked into this a lot, so it would be really good to hear his thoughts.
  6. Write an EP based on discussion around those topics.
  7. Get the CTC to approve that EP.
  8. Work on the actual code, documentation, tests, etc. in that extra repository.
  9. Open a pull request against this repository (nodejs/node) with the new code.

I can probably lead a bit of the effort, and I’ve considered doing this work alone, but so far it always ended at “this would be too much work for me to do on my own”; so if you want to help, please do, but expect to spend some time on this. ❤️

/cc @nodejs/collaborators @petkaantonov @NawarA @pemrouz

@addaleax addaleax added the feature request Issues that request new features to be added to Node.js. label May 21, 2017
@vkurchatkin
Copy link
Contributor

Count me in. That's something that I've wanted for a long time to happen.

@cjihrig
Copy link
Contributor

cjihrig commented May 21, 2017

I'm interested.

1 similar comment
@not-an-aardvark
Copy link
Contributor

I'm interested.

@aqrln
Copy link
Contributor

aqrln commented May 21, 2017

If there is anything I could be of help, I'll be glad to be involved.

@bnoordhuis
Copy link
Member

How much of Node’s standard modules should be available to workers

In a multi-process model it's easy and cheap to expose everything.

In a multi-thread model a lot of work has to be done for seemingly trivial things. Example: the current working directory is a per-process property, not per-thread. Node needs to be taught to maintain a per-thread working directory and all file system operations must use it henceforth. That requires major surgery to the fs module, process.chdir(), the module loader, etc. Native add-ons probably cannot be made to work at all.

Another problem with the multi-thread model that the multi-process model doesn't have is that the address space on 32 bits architectures is limited to ~2 GB. A V8 isolate needs a lot of virtual memory to do anything interesting. 10 or 20 threads tops will exhaust the address space and kill the process.

I'd look into the multi-process model + shared memory. That lets you support SharedArrayBuffer without the headaches of the multi-thread model.

Cross-platform shared memory has issues of its own but they are tractable: it's mostly working around platform quirks and limitations. Drudge work but nothing that is insurmountable.

@eljefedelrodeodeljefe
Copy link
Contributor

I was also researching on it, basically concluding to wait for SharedArrayBuffer and then to do a multi-process model. What would help is having Node as shared library.

@pemrouz
Copy link

pemrouz commented May 22, 2017

Thanks for taking a lead on this @addaleax! You can count on me too :)

@NawarA
Copy link

NawarA commented May 22, 2017

+1

I've produced a model of communication that I believe is simple enough, happy to share and help work on some architecture and code. Working together, I'm sure we can produce something simple and great

@refack
Copy link
Contributor

refack commented May 22, 2017

Sounds like a very interesting venture.
Would love to help!

@refack
Copy link
Contributor

refack commented May 22, 2017

I'd look into the multi-process model + shared memory.

That sound like a very JSy concept. I'm +10
(Which makes this a little like turning cluster into standard complaint and implementing on Windows, for which I'm +100)

@zimbabao
Copy link
Contributor

Very interested on working on this, count me in.

@bmeck
Copy link
Member

bmeck commented May 23, 2017

I'd be interested but only if we don't implement process level shared mutable state (like what @bnoordhuis was getting at). I'd also lean on not supporting any specific module system (CJS or ESM) in first implementation.

@refack
Copy link
Contributor

refack commented May 23, 2017

I'd be interested but only if we don't implement process level shared mutable state (like what @bnoordhuis was getting at). I'd also lean on not supporting any specific module system (CJS or ESM) in first implementation.

@mbeck that interesting...
I'm interested in your POV, could you elaborate on two points?

  1. Would you prefer threads with shared mutable state? Or multi-process but with serialized communication?
  2. If no modules, then code can only be passed from the master (a la new Worker(runFunc))?

@mhdawson
Copy link
Member

@JiapengZhu this may line up with some of the performance investigation you are looking at.

@addaleax
Copy link
Member Author

→ I just made https://github.com/nodejs/worker :) Thanks for your support, let’s hope we get somewhere with this.

@JiapengZhu
Copy link

@mhdawson sounds good, i'm interested in its performance investigation

@yorkie
Copy link
Contributor

yorkie commented Sep 28, 2017

Hi I have implemented the Web Worker standard with threading at https://github.com/Rokid/node-webworker-ng which uses the new v8's serialization API to transmit between threads, and implements the modular development at worker.

Another problem with the multi-thread model that the multi-process model doesn't have is that the address space on 32 bits architectures is limited to ~2 GB. A V8 isolate needs a lot of virtual memory to do anything interesting. 10 or 20 threads tops will exhaust the address space and kill the process.

To address this issue, I'm tend to make the worker be a real scripting place, thus we only need maintain a minimal objects in every worker and could use non-v8 engine to support the large number of threads.

@TimothyGu
Copy link
Member

@yorkie, @addaleax has also created a Worker implementation at ayojs/ayo#58, with full SharedArrayBuffer and ArrayBuffer transfer support. From a quick peek there are certainly a lot of different design choices, but it would be good to see where one excels and the other falls short.

@jasnell
Copy link
Member

jasnell commented Sep 28, 2017

Absolutely awesome that we've ended up with a couple of efforts looking at this. It will be great to see these coalesce into support in core. Definitely all for it! I'm wondering, @yorkie are you planning to be at the collaborator summit in Vancouver BC next week after the Node.js Interactive conference? If so, I'd love to schedule some time for those of us interested in this to meet up and talk through it face to face.

@TimothyGu
Copy link
Member

@bnoordhuis

It's very important to remember the original intention of the API. The ECMAScript spec defines the memory model in terms of agents and agent clusters, which are intended to be mapped to threads and processes while providing an abstraction for possible future flexibility. This is the way browsers implement it, and if Node.js is the only implementer that does not implement it that way may cause problems in the future as the rest of the web platform moves forward.

In a multi-thread model a lot of work has to be done for seemingly trivial things. Example: the current working directory is a per-process property, not per-thread. Node needs to be taught to maintain a per-thread working directory and all file system operations must use it henceforth. That requires major surgery to the fs module, process.chdir(), the module loader, etc.

I don't see why this kind of feature is necessary for workers. If workers want to change something, they should use message passing with the main thread. A worker, well, works.

Native add-ons probably cannot be made to work at all.

@addaleax did some work so that only addons that explicitly declares themselves as multi isolate-ready. I'll defer to her about how things work given that.

@bnoordhuis
Copy link
Member

I touched on that in #2133 (comment) (disjoint discussions ftw):

[..] WebWorkers-style parallelism is still an option and not terribly hard to implement but I didn't see a point in pursuing that in core, there are already add-ons that do.

I have nothing against the WebWorkers model per se, but:

  1. it doesn't need to be implemented in node.js core, and
  2. there will inevitably be scope creep when it is.

@jasnell
Copy link
Member

jasnell commented Oct 8, 2017

I think this discussion points to a couple of bits that still need to be worked through...

  1. Those of us who really want to see this land in core should take some time to articulate why. We will need to do so anyway in some form when we begin talking to end users and encouraging them to use it. What are the use cases we are solving for.

  2. We need to answer a couple of very fundamental questions, one if which is: should a worker have direct access to system i/o. If the answer is, it depends, then what does it depend on? Will a worker on core be expected to respond to network requests? Will it be expected to write to files? Will it be expected to do crypto? Will workers be capable of requiring native modules? We should make sure we have at least an idea of what these answers should be.

  3. Building on the first two points, it would be a worthwhile exercise to have a reference example of an application in which workers running in core are an obvious benefit.

This is all stuff that we're going to just need anyway as we go through this process so I really hope no one would feel like this is just throwing unnecessary walls up. I would really like to see this land, it's just clear that we need to do more than just write the cool code that makes it work.

@benjamingr
Copy link
Member

@jasnell (cc @bnoordhuis )

Those of us who really want to see this land in core should take some time to articulate why. We will need to do so anyway in some form when we begin talking to end users and encouraging them to use it. What are the use cases we are solving for.

What do you mean? There is currently no way to utilize all cores on the same memory in Node.js, workers let you do that - letting Node.js support a whole variety of use cases it has traditionally not been able to support.

For me half of the point is SharedArrayBuffer, this means you can share memory between worker threads in Node.js. The API itself isn't the most critical part, but keep in mind:

  • As you know, you can't share objects between isolates since objects live in an isolate's heap.
  • You can however, share raw bytes.
  • The v8 mechanism already exists and works in browsers (SharedArrayBuffer).
  • Anna's PR uses that mechanism to enable parallelism.

This is about adding a capability to Node.js. Now, it's certainly not the only API but given we're not going to implement locking throughout V8 and V8 won't implement threading with objects the way Nashhorn supports because it makes no sense for the DOM web platform - I think this is about the best we can do.

Optimally, I'd like something like sharing objects (and closures) between threads, synchronization primitives like a Mutex and other concurrency primitives - but given how V8 is built that is likely to never happen. Given that - the worker model with shared memory through SharedArrayBuffer is the next best thing.

@jasnell
Copy link
Member

jasnell commented Oct 9, 2017

What do you mean....

Yes, I know all of those reasons, and agree with them. I Was not asking what the use cases are, I was saying that we need to document them rather than assuming that everyone will consider it obvious. This is a significant new capability, code is fantastic, code with motivating and documented detail is even more so.

@bnoordhuis
Copy link
Member

synchronization primitives like a Mutex and other concurrency primitives

Minor tangent: mutexes, rwlocks, condition variables, barriers, etc. can be constructed out of Atomics.wait() and Atomics.wake().

@benjamingr
Copy link
Member

Minor tangent: mutexes, rwlocks, condition variables, barriers, etc. can be constructed out of Atomics.wait() and Atomics.wake().

First of all thanks, I didn't know that. Second of all - IIUC it would still only allow sharing memory in a SharedArrayBuffer (raw, binary(ish) data) - right?

(At which point the API of Atomics.wait() and Atomics.wake() seems sufficient)

@bnoordhuis
Copy link
Member

@benjamingr Yes, that's right.

@yorkie
Copy link
Contributor

yorkie commented Jan 9, 2018

Absolutely awesome that we've ended up with a couple of efforts looking at this. It will be great to see these coalesce into support in core. Definitely all for it! I'm wondering, @yorkie are you planning to be at the collaborator summit in Vancouver BC next week after the Node.js Interactive conference? If so, I'd love to schedule some time for those of us interested in this to meet up and talk through it face to face.

Sorry about missing you message @jasnell, I'd glad to merge Rokid/node-webworker into core if possible, could we have a discussion about this? Do you have any suggestion?

/cc @TimothyGu @addaleax

@devsnek
Copy link
Member

devsnek commented Jan 9, 2018

if we can make workers use esm imo it would be much nicer than monkey-patching in some require-like method. as to the stuff further up, letting these "workers" handle io, including sockets, would be a pretty cool feature, as annoying as it would be to implement.

and as a side note, i think every thread implementation i've seen for v8/node has the same issues with v8's heap allocation (including mine and @yorkie's)

@yorkie
Copy link
Contributor

yorkie commented Jan 9, 2018

Handling I/O including sockets should be done at WebWorker and easy to implement, but that's just handling in worker, we have to listen connections on master.

As for ESM in worker, that's good idea, and it could be separately implemented inside worker place :)

@TimothyGu
Copy link
Member

TimothyGu commented Jan 9, 2018

I believe ESM in workers are already supported in the https://github.com/ayojs/ayo implementation.

@devsnek
Copy link
Member

devsnek commented Jan 9, 2018

those look pretty nice, and they handle seem those heap allocation issues i mentioned earlier, perhaps @addaleax you can look into what needs to be done to get something similar in node core?

@addaleax addaleax added the worker Issues and PRs related to Worker support. label Mar 15, 2018
@aaronleeatali
Copy link

aaronleeatali commented Mar 26, 2018

I wish you can take a look at here. https://github.com/alibaba/AliOS-nodejs/wiki/Workers-in-Node.js-based-on-multithreaded-V8
We propose a system for asynchronous multithreading with an API confirming to Web Worker standard in Node.js and maybe the performance improvements show that it's worth the effort.

@jasnell
Copy link
Member

jasnell commented Sep 21, 2018

@addaleax ... now that things are progressing along nicely with workers, does this issue need to remain open?

@apapirovski
Copy link
Member

Sounds like we can close this? Everything mentioned here has been done. If another tracking issue needed, there should probably be a new one that's more current and isn't as much of a commitment to read through.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Issues that request new features to be added to Node.js. worker Issues and PRs related to Worker support.
Projects
None yet
Development

No branches or pull requests