Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User-land snapshot JS API: request for feedback #42617

Closed
joyeecheung opened this issue Apr 6, 2022 · 7 comments
Closed

User-land snapshot JS API: request for feedback #42617

joyeecheung opened this issue Apr 6, 2022 · 7 comments
Labels
snapshot Issues and PRs related to the startup snapshot

Comments

@joyeecheung
Copy link
Member

joyeecheung commented Apr 6, 2022

To improve the user experience of the user land snapshot, (quoting myself from #38905 (comment)) we need an API to specify user-land deserializers and deserialized main functions so that when starting up the application from the snapshot, the main script isn't necessary if the user already bakes a main function into the snapshot, with build-time snapshots (and maybe also run-time snapshots in later iterations) this could effectively turn the result into a single-file executable of an application.

I have a non-functional WIP here, my current idea is that in Node.js instances started in snapshot-building mode, we provide two hooks to the user, exposed by the v8.snapshot namespace, to specify user-land deserializers and the main script (ideas of better naming or better hooks are welcomed):

  • addDeserializer(func, data): can be invoked multiple times in difference places where the user needs to serialize/synchronize something that's not pure JS (e.g. system resources), the deserializers added will be invoked when the snapshot is deserialized. The second data parameter passed into it will be passed into the func when it's invoked (similar to how we handle the timer callbacks)
    • Note: modules can also detect the existence of addDeserializer from v8.snapshot to see if the application is run in snapshot building mode, so that they can add deserializers as needed.
  • setDeserializedMainFunction(func, data): as the name implies this can only be invoked once, it throws an error on the second attempt. If the main function is set, the deserialized application does not need another entry point script specified from the command line when being started, instead the function specified will be invoked after all the deserializers are invoked.

Example: assuming the following snippet is saved as snapshot.js, the binary can be built with configure --node-snapshot-main=snapshot.js at build time (requires compiling Node.js from source):

// In the snapshot main script passed to the --node-snapshot-main configure option:
const fs = require('fs');
const {
  addDeserializer,
  setDeserializedMainFunction
} = require('v8').snapshot;

// An example of allocating system resources (the fs stream) that need to be handled
// across snapshot serialization/deserialization boundaries:
function startHistory(filename) {
  const stream = fs.createWriteStream(filename, { flags: 'a' });
  const interval = setInterval(() => {
    stream.write('some history\n');
  }, 500);
  interval.unref();
  return stream;
}

let stream = startHistory('/tmp/history.txt');
addDeserializer(({filename}) => {
  stream = startHistory(filename);
}, { filename });  // The second parameter gets passed into the function.

// Do other stuff..

// Release the system resources that can be recreated later when the snapshot
// is deserialized.
stream.end();

setDeserializedMainFunction(
  (data) => {
    // If the binary with the snapshot is started as `node arg1 arg2`,
    // process.argv[1] would be 'arg1' and process.argv[2] would be 'arg2',
    // etc.
  },
  data  // The second parameter gets passed into the function.
);

// If the main function is not set, the binary with the snapshot needs to be started with an
// additional entry point like `node index.js arg1 arg2`,
// and process.argv[1] would be 'index.js',  process.argv[2]  would be 'arg1',
// etc.

With the run-time snapshot (at least the initial iteration), the snapshot blob would be written to disk (e.g. node --build-snapshot snapshot.js creates a snapshot.blob at the current working directory), so the user still needs to start the application with another snapshot blob file (e.g. node --snapshot-blob snapshot.blob arg1 arg2). In the next iteration though, we could create a copy of the binary and then append that blob to create a single-file executable, so that users can do this without having to compile Node.js from source.

Refs: #35711

@joyeecheung joyeecheung added the snapshot Issues and PRs related to the startup snapshot label Apr 6, 2022
@joyeecheung
Copy link
Member Author

cc @nodejs/startup

@joyeecheung joyeecheung changed the title User-land snapshot entry point API: request for feedback User-land snapshot JS API: request for feedback Apr 6, 2022
@benjamingr
Copy link
Member

This looks neat.

@legendecas
Copy link
Member

Should we distinguish the timing that the process is going to take the startup snapshot, and the timing that the snapshotting process is going to exit? With #42466, those two timings are the same: we can only get event process#exit.

@joyeecheung
Copy link
Member Author

joyeecheung commented Apr 7, 2022

Should we distinguish the timing that the process is going to take the startup snapshot

Sounds like a good idea, I wonder if process.on('snapshot') would be good, or maybe process.on('serialize') in case we want to emit an event for deserialization too.

Another alternative is to simply collect callbacks with a function (like addSerializer() similar to addDeserializer() in the OP) - the downside is that it might be less readable (though this is subjective), the upside is that the callbacks can't be tampered with through process._events and we don't have to worry about AbortSignal integration etc. (we could consider adding event support later using these functions, though)

@legendecas
Copy link
Member

legendecas commented Apr 7, 2022

I'd prefer a function in the require('v8').snapshot namespace than process events, for the reason you've stated.

One point for the naming is that addSerializer sounds like it can add custom logic to serialize a JavaScript object, which is not how the mechanism works. The callback is called once before the serialization, not for each JavaScript object. So I'd find it would better be named as addBeforeSerializeCallback or something. The same applies to addDeserializer.

@guest271314
Copy link

I understand the requirement and use case correctly, this will be useful. I use Native Messaging. I only need the node executable in the binary download to run JavaScript locally - I don't need the rest of the folders and files in the download. I havn't found any means to just build the node executable in /bin without the rest of the files created when Node.js is built. My use case is dynamically writing the node executable (potentially with the Native Messaging host code included in a single executable), using Native Messaging to meet requirements, then truncating the node executable to 0 when not used.

@joyeecheung
Copy link
Member Author

joyeecheung commented Apr 19, 2022

The WIP is now functional, I have a test demonstrating the current APIs, in particular I added a isBuildingSnapshot() method and use the name require('v8').startupSnapshot as the namespace (to avoid confusion with the heap snapshot and the WIP web snapshot).

The addSerializeCallback is used in is_main_thread.js to clean up the std I/O streams so I didn't write a separate test for them (probably will add some later)

'use strict';

const fs = require('fs');
const path = require('path');
const assert = require('assert');
const {
  isBuildingSnapshot,
  addDeserializeCallback,
  setDeserializeMainFunction
} = require('v8').startupSnapshot;

let deserializedKey;
const storage = {};

function checkFileInSnapshot(storage) {
  assert(!isBuildingSnapshot());
  const fixture = process.env.NODE_TEST_FIXTURE;
  const readFile = fs.readFileSync(fixture);
  console.log(`Read ${fixture} in deserialize main, length = ${readFile.byteLength}`);
  assert.deepStrictEqual(storage[deserializedKey], readFile);
}

if (isBuildingSnapshot()) {
  const fixture = path.join(__filename);

  const file = fs.readFileSync(fixture);
  console.log(`Read ${fixture} in snapshot main, length = ${file.byteLength}`);
  storage[fixture] = file;

  addDeserializeCallback((key) => {
    console.log('running deserialize callback');
    deserializedKey = key;
  }, fixture);

  setDeserializeMainFunction(
    checkFileInSnapshot,
    storage
  );
}
$ ./configure --ninja --node-snapshot-main=test/fixtures/snapshot/v8-startup-snapshot-api.js
$ ninja -C out/Release node
ninja: Entering directory `out/Release'
[1/4] ACTION node: node_mksnapshot_9b7a2d2290b02e76d66661df74749f56
Read /Users/joyee/projects/node/test/fixtures/snapshot/v8-startup-snapshot-api.js in snapshot main, length = 1009
[4/4] LINK node, POSTBUILDS
$ NODE_TEST_FIXTURE=test/fixtures/snapshot/v8-startup-snapshot-api.js out/Release/node
running deserialize callback
Read test/fixtures/snapshot/v8-startup-snapshot-api.js in deserialize main, length = 1009

targos pushed a commit that referenced this issue Jul 12, 2022
This adds several APIs under the `v8.startupSnapshot` namespace
for specifying hooks into the startup snapshot serialization
and deserialization.

- isBuildingSnapshot()
- addSerializeCallback()
- addDeserializeCallback()
- setDeserializeMainFunction()

PR-URL: #43329
Fixes: #42617
Refs: #35711
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
targos pushed a commit that referenced this issue Jul 31, 2022
This adds several APIs under the `v8.startupSnapshot` namespace
for specifying hooks into the startup snapshot serialization
and deserialization.

- isBuildingSnapshot()
- addSerializeCallback()
- addDeserializeCallback()
- setDeserializeMainFunction()

PR-URL: #43329
Fixes: #42617
Refs: #35711
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
guangwong pushed a commit to noslate-project/node that referenced this issue Oct 10, 2022
This adds several APIs under the `v8.startupSnapshot` namespace
for specifying hooks into the startup snapshot serialization
and deserialization.

- isBuildingSnapshot()
- addSerializeCallback()
- addDeserializeCallback()
- setDeserializeMainFunction()

PR-URL: nodejs/node#43329
Fixes: nodejs/node#42617
Refs: nodejs/node#35711
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
snapshot Issues and PRs related to the startup snapshot
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants