Optimize position search in error path #1160

purplesyringa · 2024-07-26T21:32:55Z

Translating index into a line/column pair takes considerable time. Notably, the JSON benchmark modified to run on malformed data spends around 50% of the CPU time generating the error object.

While it is generally assumed that the cold path is quite slow, such a drastic pessimization may be unexpected, especially when a faster implementation exists.

Using vectorized routines provided by the memchr crate increases performance of the failure path by 2x on average.

Old implementation:

			DOM         STRUCT
data/canada.json        122 MB/s    168 MB/s
data/citm_catalog.json  135 MB/s    195 MB/s
data/twitter.json       142 MB/s    226 MB/s

New implementation:

			DOM         STRUCT
data/canada.json        216 MB/s    376 MB/s
data/citm_catalog.json  238 MB/s    736 MB/s
data/twitter.json       210 MB/s    492 MB/s

While this introduces a new dependency, memchr is much faster to compile than serde, so compile time does not increase significantly. Additionally, memchr provides a more efficient SWAR-based implementation of both the memchr and count routines even without std, providing benefits for embedded uses as well.

purplesyringa · 2024-07-26T21:55:32Z

For easier comparison, here are the results of the benchmark for the success path on the same machine:

                        DOM         STRUCT
data/canada.json        283 MB/s    416 MB/s
data/citm_catalog.json  429 MB/s    864 MB/s
data/twitter.json       275 MB/s    541 MB/s

Translating index into a line/column pair takes considerable time. Notably, the JSON benchmark modified to run on malformed data spends around 50% of the CPU time generating the error object. While it is generally assumed that the cold path is quite slow, such a drastic pessimization may be unexpected, especially when a faster implementation exists. Using vectorized routines provided by the memchr crate increases performance of the failure path by 2x on average. Old implementation: DOM STRUCT data/canada.json 122 MB/s 168 MB/s data/citm_catalog.json 135 MB/s 195 MB/s data/twitter.json 142 MB/s 226 MB/s New implementation: DOM STRUCT data/canada.json 216 MB/s 376 MB/s data/citm_catalog.json 238 MB/s 736 MB/s data/twitter.json 210 MB/s 492 MB/s In comparison, the performance of the happy path is: DOM STRUCT data/canada.json 283 MB/s 416 MB/s data/citm_catalog.json 429 MB/s 864 MB/s data/twitter.json 275 MB/s 541 MB/s While this introduces a new dependency, memchr is much faster to compile than serde, so compile time does not increase significantly. Additionally, memchr provides a more efficient SWAR-based implementation of both the memchr and count routines even without std, providing benefits for embedded uses as well.

dtolnay

Thank you!

I wonder where else memchr can be profitably applied. Maybe this loop:

json/src/read.rs

Lines 546 to 548 in 40dd7f5

    
           while self.index < self.slice.len() && !ESCAPE[self.slice[self.index] as usize] { 
        
               self.index += 1; 
        
           }

We can use memchr2 to find " or \, then separately check the intervening characters for ch >= 0x20. I bet it would be faster than the current loop. This is a very similar pattern to what's discussed in https://nrk.neocities.org/articles/cpu-vs-common-sense.

purplesyringa force-pushed the efficient-position branch from 7d76cdc to b1edc7d Compare July 26, 2024 22:30

dtolnay approved these changes Jul 28, 2024

View reviewed changes

dtolnay merged commit b0d678c into serde-rs:master Jul 28, 2024
13 checks passed

purplesyringa deleted the efficient-position branch July 29, 2024 05:17

purplesyringa mentioned this pull request Jul 29, 2024

Vectorize string parsing #1161

Merged

divinenaman mentioned this pull request Aug 25, 2024

simd-within-a-register divinenaman/explore-rust#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize position search in error path #1160

Optimize position search in error path #1160

purplesyringa commented Jul 26, 2024

purplesyringa commented Jul 26, 2024

dtolnay left a comment

	while self.index < self.slice.len() && !ESCAPE[self.slice[self.index] as usize] {
	self.index += 1;
	}

Optimize position search in error path #1160

Optimize position search in error path #1160

Conversation

purplesyringa commented Jul 26, 2024

purplesyringa commented Jul 26, 2024

dtolnay left a comment

Choose a reason for hiding this comment