Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bootstrap: reorganize the bootstrap process #27539

Closed
wants to merge 6 commits into from

Conversation

joyeecheung
Copy link
Member

@joyeecheung joyeecheung commented May 2, 2019

This patch reorganizes the bootstrap process so it's easier
to snapshot it incrementally.

src: inline ProcessCliArgs in the Environment constructor

Inline ProcessCliArgs() in the Environment constructor, and
emit the Environment creation trace events with the arguments
earlier. Remove the unused arguments passed to CreateProcessObject()
since these are now attached to process in PatchProcessObject()
during pre-execution instead.

src: create Environment properties in Environment::CreateProperties()

Move creation of env->as_callback_data(), env->primordials()
and env->process() into Environment::CreateProperties() and
call it in the Environment constructor - this can be replaced with
deserialization when we snapshot the per-environment properties
after the instantiation of Environment.

src: reorganize inspector and diagnostics initialization

  • Split the initialization of the inspector and other diagnostics
    into Environment::InitializeInspector() and
    Environment::InitializeDiagnostics() - these need to be
    reinitialized separately after snapshot deserialization.
  • Do not store worker url alongside the inspector parent handle,
    instead just get it from the handle.
  • Rename Worker::profiler_idle_notifier_started_ to
    Worker::start_profiler_idle_notifier_ because it stores
    the state inherited from the parent env to use for initializing
    itself.

src: Split RunBootstrapping()

Split RunBootstrapping() into BootstrapInternalLoaders()
and BootstrapNode() from so the two can be snapshotted
incrementally.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • commit message follows commit guidelines

@nodejs-github-bot

This comment has been minimized.

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. labels May 2, 2019
@nodejs-github-bot

This comment has been minimized.

@nodejs-github-bot
Copy link
Collaborator

@joyeecheung
Copy link
Member Author

cc @nodejs/process

@joyeecheung
Copy link
Member Author

Based on the GitHub reviewer suggestions...@addaleax @bnoordhuis @jasnell can I have some review please?

@joyeecheung
Copy link
Member Author

Ping...I would like to have this landed first before moving to snapshot more of the bootstrap process (it won't be possible without changing the order of inspector and event loop initialization, really)

@joyeecheung joyeecheung added the review wanted PRs that need reviews. label May 9, 2019
@Fishrock123
Copy link
Contributor

Can this be broken down into multiple commits? That might make review easier...

@Fishrock123
Copy link
Contributor

Ok, so I try not to mention this because it's what we (NodeSource) have signed up for in maintaining our own node fork/runtime (N|Solid)... however for Node 12 so much of the startup and related internals have been moved around so much that the rebase pain is quite immense. 😓

Are we really gaining much from doing these changes...? (I mean, this PR is particular..)

@joyeecheung
Copy link
Member Author

joyeecheung commented May 14, 2019

@Fishrock123 This is required for including these parts in the v8 snapshot, see the design doc on the constraints the Environment needs to follow in order for it to be snapshottable.

I think I've laid out the reasons for doing these in the PR message clearly enough, but to explain further:

  1. We need to deserialize the env->props() properties from the snapshot instead of creating them from scratch. It would be easier if the creation of them is separated in a method and then you'll be able to control when these properties are created all together.
  2. The snapshot creator discards inspector states as well as a bunch of other per-isolate callbacks. We need to reinstall them after the environment is deserialized. So these are put into separate methods.
  3. Similarly it would be easier to snapshot loaders.js and node.js incrementally if we split the invocation of them into separate methods.

On how this refactoring is used, see this for a prototype. The alternative to refactoring is..well, copy pasting all these code for the snapshot creator, and adding a bunch of deserialization in the middle, like what was done in the initial prototype of the RFC. That would not be maintainable, and may not really help with rebasing as there would still be conflicts with deserialization code.

@joyeecheung
Copy link
Member Author

joyeecheung commented May 14, 2019

By the way, from my experience maintaining a similar fork in my previous job, one way to make the rebasing easier is instead of rebasing, squash the changes into commits that make sense, and cherry-picking them onto the upstream branches. This is especially important for paths like the bootstrap, because it has been quite messy - most things were done in a certain order for no particularly good reason but simply because it worked. The downstream patches likely follow the same kind of reasoning when they add stuff in the middle, so once someone tries to make the upstream more organized, there would be a lot of conflicts since the patches still relied on that disorganized state of things.

It would be easier to bring the two together if the downstream patches encapsulate what they need to do in a few method invocations (with implementation details written in separate files to reduce conflicts), and when being cherry-picked onto the upstream, just insert the invocation to the new places where it makes more sense for these to happen. The worst that could happen is when you implement things right inside the upstream files in the middle of the code that's being refactored, and the downstream commits are mixed in the history with the upstream commits. Then it would very difficult to rebase once a refactored branch comes out (I had worked with that kind of git states before so I know how painful it could be). Although once the upstream starts to become organized, it tend to be easier to merge the two from then on because the patches are then inserted into logical places that tend to stabalize.

Inline `ProcessCliArgs()` in the `Environment` constructor, and
emit the `Environment` creation trace events with the arguments
earlier. Remove the unused arguments passed to `CreateProcessObject()`
since these are now attached to process in `PatchProcessObject()`
during pre-execution instead.
Move creation of `env->as_callback_data()`, `env->primordials()`
and `env->process()` into `Environment::CreateProperties()` and
call it in the `Environment` constructor - this can be replaced with
deserialization when we snapshot the per-environment properties
after the instantiation of `Environment`.
- Split the initialization of the inspector and other diagnostics
  into `Environment::InitializeInspector()` and
  `Environment::InitializeDiagnostics()` - these need to be
  reinitialized separately after snapshot deserialization.
- Do not store worker url alongside the inspector parent handle,
  instead just get it from the handle.
- Rename `Worker::profiler_idle_notifier_started_` to
  `Worker::start_profiler_idle_notifier_` because it stores
  the state inherited from the parent env to use for initializing
  itself.
Split `RunBootstrapping()` into `BootstrapInternalLoaders()`
and `BootstrapNode()` from so the two can be snapshotted
incrementally.
@joyeecheung
Copy link
Member Author

@Fishrock123 @addaleax @bnoordhuis @jasnell I split the PR into multiple commits, the descriptions are updated in the OP. PTAL, thanks!

@nodejs-github-bot
Copy link
Collaborator

@joyeecheung
Copy link
Member Author

joyeecheung commented May 27, 2019

Pinging for reviews again...this has been open for over three weeks without any reviews

src/env.cc Outdated Show resolved Hide resolved
src/env.cc Show resolved Hide resolved
src/node_worker.cc Show resolved Hide resolved
@nodejs-github-bot
Copy link
Collaborator

src/node.cc Outdated
Isolate* isolate = env->isolate();
Local<Context> context = env->context();
isolate_->GetHeapProfiler()->AddBuildEmbedderGraphCallback(
Environment::BuildEmbedderGraph, this);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is dependent on #if HAVE_INSPECTOR now, but you can do heap dumps without the inspector, so should we maybe not do this here? (Sorry that I’m bringing it up again 😄)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An right, maybe it also makes sense to move it back to InitializeDiagnostics and then we invoke InitializeDiagnostics before InitializeInspector. The other part of InitializeDiagnostics is adding GC callbacks for dtrace, I think it also make sense to do that before starting the inspector agent.

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@joyeecheung
Copy link
Member Author

joyeecheung commented Jun 3, 2019

Landed in f0018a5...0778599

@joyeecheung joyeecheung closed this Jun 3, 2019
joyeecheung added a commit that referenced this pull request Jun 3, 2019
Inline `ProcessCliArgs()` in the `Environment` constructor, and
emit the `Environment` creation trace events with the arguments
earlier. Remove the unused arguments passed to `CreateProcessObject()`
since these are now attached to process in `PatchProcessObject()`
during pre-execution instead.

PR-URL: #27539
Reviewed-By: Anna Henningsen <anna@addaleax.net>
joyeecheung added a commit that referenced this pull request Jun 3, 2019
Move creation of `env->as_callback_data()`, `env->primordials()`
and `env->process()` into `Environment::CreateProperties()` and
call it in the `Environment` constructor - this can be replaced with
deserialization when we snapshot the per-environment properties
after the instantiation of `Environment`.

PR-URL: #27539
Reviewed-By: Anna Henningsen <anna@addaleax.net>
joyeecheung added a commit that referenced this pull request Jun 3, 2019
- Split the initialization of the inspector and other diagnostics
  into `Environment::InitializeInspector()` and
  `Environment::InitializeDiagnostics()` - these need to be
  reinitialized separately after snapshot deserialization.
- Do not store worker url alongside the inspector parent handle,
  instead just get it from the handle.
- Rename `Worker::profiler_idle_notifier_started_` to
  `Worker::start_profiler_idle_notifier_` because it stores
  the state inherited from the parent env to use for initializing
  itself.

PR-URL: #27539
Reviewed-By: Anna Henningsen <anna@addaleax.net>
joyeecheung added a commit that referenced this pull request Jun 3, 2019
Split `RunBootstrapping()` into `BootstrapInternalLoaders()`
and `BootstrapNode()` from so the two can be snapshotted
incrementally.

PR-URL: #27539
Reviewed-By: Anna Henningsen <anna@addaleax.net>
BridgeAR pushed a commit that referenced this pull request Jun 17, 2019
Inline `ProcessCliArgs()` in the `Environment` constructor, and
emit the `Environment` creation trace events with the arguments
earlier. Remove the unused arguments passed to `CreateProcessObject()`
since these are now attached to process in `PatchProcessObject()`
during pre-execution instead.

PR-URL: #27539
Reviewed-By: Anna Henningsen <anna@addaleax.net>
BridgeAR pushed a commit that referenced this pull request Jun 17, 2019
Move creation of `env->as_callback_data()`, `env->primordials()`
and `env->process()` into `Environment::CreateProperties()` and
call it in the `Environment` constructor - this can be replaced with
deserialization when we snapshot the per-environment properties
after the instantiation of `Environment`.

PR-URL: #27539
Reviewed-By: Anna Henningsen <anna@addaleax.net>
BridgeAR pushed a commit that referenced this pull request Jun 17, 2019
- Split the initialization of the inspector and other diagnostics
  into `Environment::InitializeInspector()` and
  `Environment::InitializeDiagnostics()` - these need to be
  reinitialized separately after snapshot deserialization.
- Do not store worker url alongside the inspector parent handle,
  instead just get it from the handle.
- Rename `Worker::profiler_idle_notifier_started_` to
  `Worker::start_profiler_idle_notifier_` because it stores
  the state inherited from the parent env to use for initializing
  itself.

PR-URL: #27539
Reviewed-By: Anna Henningsen <anna@addaleax.net>
BridgeAR pushed a commit that referenced this pull request Jun 17, 2019
Split `RunBootstrapping()` into `BootstrapInternalLoaders()`
and `BootstrapNode()` from so the two can be snapshotted
incrementally.

PR-URL: #27539
Reviewed-By: Anna Henningsen <anna@addaleax.net>
@BridgeAR BridgeAR mentioned this pull request Jun 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. review wanted PRs that need reviews.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants