Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize position search in error path #1160

Merged
merged 1 commit into from
Jul 28, 2024

Conversation

purplesyringa
Copy link
Contributor

Translating index into a line/column pair takes considerable time. Notably, the JSON benchmark modified to run on malformed data spends around 50% of the CPU time generating the error object.

While it is generally assumed that the cold path is quite slow, such a drastic pessimization may be unexpected, especially when a faster implementation exists.

Using vectorized routines provided by the memchr crate increases performance of the failure path by 2x on average.

Old implementation:

			DOM         STRUCT
data/canada.json        122 MB/s    168 MB/s
data/citm_catalog.json  135 MB/s    195 MB/s
data/twitter.json       142 MB/s    226 MB/s

New implementation:

			DOM         STRUCT
data/canada.json        216 MB/s    376 MB/s
data/citm_catalog.json  238 MB/s    736 MB/s
data/twitter.json       210 MB/s    492 MB/s

While this introduces a new dependency, memchr is much faster to compile than serde, so compile time does not increase significantly. Additionally, memchr provides a more efficient SWAR-based implementation of both the memchr and count routines even without std, providing benefits for embedded uses as well.

@purplesyringa
Copy link
Contributor Author

For easier comparison, here are the results of the benchmark for the success path on the same machine:

                        DOM         STRUCT
data/canada.json        283 MB/s    416 MB/s
data/citm_catalog.json  429 MB/s    864 MB/s
data/twitter.json       275 MB/s    541 MB/s

Translating index into a line/column pair takes considerable time.
Notably, the JSON benchmark modified to run on malformed data spends
around 50% of the CPU time generating the error object.

While it is generally assumed that the cold path is quite slow, such a
drastic pessimization may be unexpected, especially when a faster
implementation exists.

Using vectorized routines provided by the memchr crate increases
performance of the failure path by 2x on average.

Old implementation:
				DOM         STRUCT
	data/canada.json        122 MB/s    168 MB/s
	data/citm_catalog.json  135 MB/s    195 MB/s
	data/twitter.json       142 MB/s    226 MB/s

New implementation:
				DOM         STRUCT
	data/canada.json        216 MB/s    376 MB/s
	data/citm_catalog.json  238 MB/s    736 MB/s
	data/twitter.json       210 MB/s    492 MB/s

In comparison, the performance of the happy path is:

				DOM         STRUCT
	data/canada.json        283 MB/s    416 MB/s
	data/citm_catalog.json  429 MB/s    864 MB/s
	data/twitter.json       275 MB/s    541 MB/s

While this introduces a new dependency, memchr is much faster to compile
than serde, so compile time does not increase significantly.
Additionally, memchr provides a more efficient SWAR-based implementation
of both the memchr and count routines even without std, providing
benefits for embedded uses as well.
Copy link
Member

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

I wonder where else memchr can be profitably applied. Maybe this loop:

json/src/read.rs

Lines 546 to 548 in 40dd7f5

while self.index < self.slice.len() && !ESCAPE[self.slice[self.index] as usize] {
self.index += 1;
}

We can use memchr2 to find " or \, then separately check the intervening characters for ch >= 0x20. I bet it would be faster than the current loop. This is a very similar pattern to what's discussed in https://nrk.neocities.org/articles/cpu-vs-common-sense.

@dtolnay dtolnay merged commit b0d678c into serde-rs:master Jul 28, 2024
13 checks passed
@purplesyringa purplesyringa deleted the efficient-position branch July 29, 2024 05:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants