Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] lib: optimize util.format() #215

Closed
wants to merge 3 commits into from

Conversation

bnoordhuis
Copy link
Member

Optimize util.format() by parsing the format string and generating
specialized code on the fly, then caching the result in a LRU list.

This change is based on the observation that most applications that
log extensively, often have very skewed distributions of log patterns.
It's common to see top 10 or top 25 of popular patterns, followed by
a long (sometimes very long) tail of less popular patterns.

This is a work in progress: the common case is currently 2-25x faster
but the worst case - every pattern unique - is 4-5x slower.

WIP, soliciting comments.

Make a number of functions from require('_linklist') return the list or
the item just inserted.  Makes the API a little less unwieldy to use.
Add a simple (too simple!) benchmark for util.format().
Optimize util.format() by parsing the format string and generating
specialized code on the fly, then caching the result in a LRU list.

This change is based on the observation that most applications that
log extensively, often have very skewed distributions of log patterns.
It's common to see top 10 or top 25 of popular patterns, followed by
a long (sometimes very long) tail of less popular patterns.

This is a work in progress: the common case is currently 2-25x faster
but the worst case - every pattern unique - is 4-5x slower.
@rvagg rvagg mentioned this pull request Dec 30, 2014
@piscisaureus piscisaureus reopened this Dec 30, 2014
source += ' return s;';
source += ' };';

return Function('inspect', source)(inspect);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: the generated code gets optimized as of https://codereview.chromium.org/821553003/

@trevnorris
Copy link
Contributor

@bnoordhuis Side note, I've noticed that v8::Script::Compile() then v8::Script::Run() is 2x faster than new Function() from JS.

@yosuke-furukawa
Copy link
Member

I have benchmark results using template literals vs use "+" vs util.format.

Just FYI:
http://jsperf.com/template-literal-vs-string-plus/2

@algesten
Copy link

@yosuke-furukawa OMG! xD

@trevnorris
Copy link
Contributor

@yosuke-furukawa I'll have to refer you to http://mrale.ph/blog/2012/12/15/microbenchmarks-fairy-tale.html. The benchmark you show can be pretty much optimized out completely.

}

function untaint(s) {
return s.replace(/[\\"]|[^\x20-\x7F]/g, function(c) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn’t /[\\"\r\n\u2028\u2029]/g enough?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you bring it up: probably not. With Harmony template strings being available now, the generated code is susceptible to template literal injection. I'll have to look at this again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/[\\"\r\n\u2028\u2029]/g is enough, and template strings don’t affect the parsing of double-quoted string literals.

@tellnes
Copy link
Contributor

tellnes commented Feb 5, 2015

Since this introduces a new module (sprintf) which conflicts with a similar module on npm (npm.im/sprintf), should this be a major version bump. The sprintf should also probably be documented.

source += ', a' + i;

source += ') { var argc = arguments.length, s = "";' + body;
source += ' for (var i = ' + argc + ' + 1; i < arguments.length; i += 1) {';
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you want to use an internal argc instead of arguments.length for for loop as a cached value? Otherwise argc is unused.

@mscdex mscdex added the util Issues and PRs related to the built-in util module. label Mar 22, 2015
@chrisdickinson
Copy link
Contributor

Since #848 landed, we can probably introduce sprintf as an internal module.

@brendanashworth brendanashworth added the wip Issues and PRs that are still a work in progress. label May 11, 2015
@Qard
Copy link
Member

Qard commented Jun 26, 2015

Is anything happening with this at this point, or can it be closed?

@bnoordhuis
Copy link
Member Author

I'll close it and revisit when I have time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Issues and PRs related to the performance of Node.js. util Issues and PRs related to the built-in util module. wip Issues and PRs that are still a work in progress.
Projects
None yet
Development

Successfully merging this pull request may close these issues.