Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module: speed up package.json parsing #15767

Closed
wants to merge 2 commits into from

Conversation

bnoordhuis
Copy link
Member

@bnoordhuis bnoordhuis commented Oct 4, 2017

The first commit is a bug fix and should be back-ported to the release branches but I'm including it here because the other commits build on top of it.

From the second commit log:

If the package.json does not contain the string '"main"', skip parsing
it to JSON.

Note that this changes the behavior of the module loader in the presence
of package.json files that don't contain legal JSON. Such files used to
throw an exception but now they are simply ignored unless they contain a
"main" property.

To me, that seems like a good trade-off: I observe a 25% reduction in
start-up time on a medium-sized application

That 25% is going to depend on how many package.json files have a "main" property. With another project, only a handful out of ~250 did not have one.

CI: https://ci.nodejs.org/job/node-test-pull-request/10401/

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. labels Oct 4, 2017
@Trott
Copy link
Member

Trott commented Oct 4, 2017

Thoughts on semver-ness?

@bnoordhuis
Copy link
Member Author

I'm inclined to say semver-patch. The change in behavior only affects applications that don't start up in the first place.

@@ -112,6 +112,9 @@ function readPackage(requestPath) {
return false;
}

if (json === '')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does V8 make this and json.length === 0 equivalent instruction-wise?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s.length === 0 is probably a bit slower. The empty string is a singleton so s === '' is basically a pointer comparison, whereas s.length needs to validate s first (although I expect it will be a fast path protected by a guard once optimized.)

@guybedford
Copy link
Contributor

On the example project with ~250 dependencies and only a handful without a main, how did the performance copmare? Just wondering how much the slow-down might be for an application where every package.json file does have a main and how negligible this is.

@addaleax
Copy link
Member

addaleax commented Oct 4, 2017

Just to give a number, of the 596 package.jsons in Node’s tools/ and deps/, 364 contain the string "main", so it’s reasonable to assume that about 40 % of package.jsons in the wild can be skipped this way.

@bnoordhuis
Copy link
Member Author

On the example project with ~250 dependencies and only a handful without a main, how did the performance copmare?

It's the same, no statistically significant speed-up or slow-down. The string search is practically free compared to deserializing the JSON object.

One potential improvement is to implement our own special-purpose JSON parser that scans for just the "main" property and ignores everything else - that would help with loading e.g. async, which has a > 50 kB package.json.

That's a lot more work though and I'm not even sure it's a good idea per se.

@guybedford
Copy link
Contributor

That may well be interesting to work on in future for additional performance gain.

Copy link
Contributor

@cjihrig cjihrig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

@mscdex
Copy link
Contributor

mscdex commented Oct 5, 2017

Should this PR (except the first commit) be semver-major because of the change in module loading behavior (invalid/malformatted modules may no longer be detected)?

@bnoordhuis
Copy link
Member Author

@mscdex My opinion in case you missed it: #15767 (comment)

@Fishrock123
Copy link
Contributor

The first commit is a bug fix and should be back-ported to the release branches but I'm including it here because the other commits build on top of it.

It would be far more ideal to releasers if this was separate from commits that aren't that category.

Such files used to
throw an exception but now they are simply ignored unless they contain a
"main" property.

If that exception bubbled up to user land I do think this (the sum of the 3 commits) is semver-major?

If the package.json does not contain the string '"main"', skip parsing
it to JSON.

Note that this changes the behavior of the module loader in the presence
of package.json files that don't contain legal JSON.  Such files used to
throw an exception but now they are simply ignored unless they contain a
"main" property.

To me, that seems like a good trade-off: I observe a 25% reduction in
start-up time on a medium-sized application[0].

[0] https://github.com/strongloop/sls-sample-app
Move the logic from the previous commit to C++ land in order to avoid
creating a new string when we know we won't parse it anyway.
@bnoordhuis
Copy link
Member Author

@bnoordhuis bnoordhuis closed this Nov 15, 2017
bnoordhuis added a commit to bnoordhuis/io.js that referenced this pull request Nov 15, 2017
If the package.json does not contain the string '"main"', skip parsing
it to JSON.

Note that this changes the behavior of the module loader in the presence
of package.json files that don't contain legal JSON.  Such files used to
throw an exception but now they are simply ignored unless they contain a
"main" property.

To me, that seems like a good trade-off: I observe a 25% reduction in
start-up time on a medium-sized application[0].

[0] https://github.com/strongloop/sls-sample-app

PR-URL: nodejs#15767
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
bnoordhuis added a commit to bnoordhuis/io.js that referenced this pull request Nov 15, 2017
Move the logic from the previous commit to C++ land in order to avoid
creating a new string when we know we won't parse it anyway.

PR-URL: nodejs#15767
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
@bnoordhuis bnoordhuis added the semver-major PRs that contain breaking changes and should be released in the next major version. label Nov 15, 2017
@bnoordhuis
Copy link
Member Author

1132ea7...0fdd88a and tagged semver-major because I'm conservative like that.

@charmander
Copy link
Contributor

"\u006dain"?

@jdalton
Copy link
Member

jdalton commented Nov 16, 2017

I think this should be revisited. InternalModuleReadFile was more generically useful before and could have been applied to loading more than just json.

However, now with the introduction of this bit it essentially locks it into json only. If that's the case, I think the name InternalModuleReadFile, and the pseudo exposed process.binding("fs").internalModuleReadFile, should either be renamed or a new internal for json-only should be created (maybe that wraps InternalModuleReadFile).

Update:
Raised issue #17076 and PR #17084.

@charmander
Copy link
Contributor

Does this fail if "main" is written with \u escapes?

@bnoordhuis
Copy link
Member Author

@charmander I guess it would. Can you point to a module that does that?

@jdalton
Copy link
Member

jdalton commented Nov 18, 2017

@bnoordhuis

That 25% is going to depend on how many package.json files have a "main" property. With another project, only a handful out of ~250 did not have one.

Was the perf win, when applicable, primarily from the first commit (the one that didn't touch the c++ helper)? If so I think that could be enough, without having to go deeper into the internals #17084.

@bnoordhuis
Copy link
Member Author

@jdalton The second commit avoids ~300 kB of strings being created needlessly when you invoke npm. That's pretty substantial.

@jdalton
Copy link
Member

jdalton commented Nov 19, 2017

@bnoordhuis

The second commit avoids ~300 kB of strings being created needlessly when you invoke npm. That's pretty substantial.

What's that mean for execution time? (e.g. does it move from 25% to 30% improvement?)

What I'm getting at is, that while it is nice, it's not a guaranteed perf win since it's super scenario specific (npm init creates package.json files with a "main" field) and also diminishes the usefulness from the helper (moved from generic read file fast path to only json). So it's a step too far in the name of a might-be optimization.

@bnoordhuis
Copy link
Member Author

It cuts down the number and total size of package.json strings by about 35%. To put that in perspective: even a 1% or 2% improvement would have been well worth it, every bit helps.

If you don't think it's a worthwhile optimization, the onus is on you to prove it with numbers (but you'd be wasting your time because it is.)

@jdalton
Copy link
Member

jdalton commented Nov 19, 2017

To put that in perspective: even a 1% or 2% improvement would have been well worth it, every bit helps.

In the past, improvements like using InternalReadFile for more than just json, which I'm sure would fall into that category or better, were held off on because of potential breakage.

If you don't think it's a worthwhile optimization, the onus is on you to prove it with numbers

Wouldn't the responsibility be to prove the perf wins before introducing something that requires a semver major? I'm all for optimizations, but I'd like to have a more clear picture on what we're giving up usability and potential user code breakage (semver major) for. I don't think asking for the percentage wins of the second commit is unreasonable.

So again, what's that mean for execution time? (e.g. does it move from 25% to 30% improvement?)

@bnoordhuis
Copy link
Member Author

The semver-major tag is just me being ultra-conservative (and also about giving people an incentive to upgrade when v10 comes out.)

@jdalton
Copy link
Member

jdalton commented Nov 19, 2017

@bnoordhuis
Instead of locking InternalReadFile down to just json what are your thoughts on creating a InternalReadJON and using InternalReadFile for more than just json (so loading modules). Do you have a wag on what, if any, the improvements would be from that?

@charmander
Copy link
Contributor

@bnoordhuis I can’t, but is it maybe worth also checking for \u006 to avoid making that exception?

@bnoordhuis
Copy link
Member Author

@jdalton See #4679, my original PR did that but it caused regressions on Windows and I never quite figured out why.

@charmander I suppose so, I'll think about it. I don't mind doing that if it's necessary but neither do I like adding complexity and overhead for what is most likely a non-issue.

@jdalton
Copy link
Member

jdalton commented Nov 20, 2017

@bnoordhuis might be worth trying again (with today's code base). It looks like there was other Jenkins weirdness back then too.

@TimothyGu
Copy link
Member

TimothyGu commented Nov 20, 2017

@charmander I think it's mostly a non-issue as npm serializes and reserializes package.json during installation, which replaces \u006d with m. I don't know if Yarn does that, however.

@charmander
Copy link
Contributor

It’s a small amount of code complexity and overhead in exchange for replacing “works for almost all valid JSON representations” and “mostly a non-issue” with “works for all JSON representations” and “is a non-issue”. Is the certainty worth having?

@bnoordhuis
Copy link
Member Author

Not if it's dead code. Then it's just a place for bugs to hide.

(I don't subscribe to Postel's maxim in case you wondered.)

@charmander
Copy link
Contributor

I don't subscribe to Postel's maxim in case you wondered.

That doesn’t really apply here – JSON is JSON. I’ll open a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. semver-major PRs that contain breaking changes and should be released in the next major version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.