Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

codec/proto: reuse of marshal byte buffers #3167

Merged
merged 8 commits into from
Dec 20, 2019

Conversation

adtac
Copy link
Contributor

@adtac adtac commented Nov 8, 2019

Performance benchmarks can be found below. Obviously, a 8 KiB
request/response is tailored to showcase this improvement as this is
where codec buffer reuse shines, but I've run other benchmarks too (like
1-byte requests and responses) and there's no discernable impact on
performance.

We do not allow reuse of buffers when stat handlers or binlogs are
turned on. This is because those two may need access to the data and
payload even after the data has been written to the wire. In such cases,
we never return the data back to the pool.

A buffer reuse threshold of 1 KiB was determined after several
experiments. There's diminished returns when buffer reuse is enabled for
smaller messages (actually, a negative impact).

unary-networkMode_none-bufConn_false-keepalive_false-benchTime_40s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_6-reqSize_8192B-respSize_8192B-compressor_off-channelz_false-preloader_false
               Title       Before        After Percentage
            TotalOps       839638       906223     7.93%
             SendOps            0            0      NaN%
             RecvOps            0            0      NaN%
            Bytes/op    103788.29     80592.47   -22.35%
           Allocs/op       183.33       189.30     3.27%
             ReqT/op 1375662899.20 1484755763.20     7.93%
            RespT/op 1375662899.20 1484755763.20     7.93%
            50th-Lat    238.746µs    225.019µs    -5.75%
            90th-Lat    514.253µs    456.439µs   -11.24%
            99th-Lat    711.083µs    702.466µs    -1.21%
             Avg-Lat     285.45µs    264.456µs    -7.35%

Closes #2816

@adtac adtac force-pushed the adtac/codec-reuse branch 11 times, most recently from 13e18cb to 506dba6 Compare November 12, 2019 18:10
@adtac
Copy link
Contributor Author

adtac commented Nov 12, 2019

Well this turned out to be much more complicated than I originally thought:

  • We can't have buffer reuse when stats handlers and binlogs are enabled because these can store references to the buffer internally.
  • Buffer re-use is tricky when client retries are involved.

Previously, I returned the buffer back to the codec layer as soon as the transport layer (read: loopy) was done with the buffer, but that's not a good with retries because the buffered slice of retry funcs will retain references to the payload, which may actually be the plain encoded data if no compression is used. Therefore, we may return the buffer back to the encoder only after the entire RPC is concluded, so we need to store all the buffers to return.

@menghanl menghanl added the Type: Performance Performance improvements (CPU, network, memory, etc) label Nov 12, 2019
@menghanl menghanl added this to the 1.26 Release milestone Nov 12, 2019
@adtac
Copy link
Contributor Author

adtac commented Nov 14, 2019

Why is bufferReuseThreshold = 1024? It's because everything below it negatively impacts performance. Anything higher than that wastes memory when buffer reuse could positively impact performance.

qps
bytes
50th
90th

@adtac adtac force-pushed the adtac/codec-reuse branch 2 times, most recently from 611e345 to 6b06f6e Compare November 14, 2019 00:34
@dfawley dfawley self-requested a review November 14, 2019 21:08
@dfawley dfawley self-assigned this Nov 14, 2019
@adtac adtac force-pushed the adtac/codec-reuse branch 2 times, most recently from 9839ca3 to 47a9877 Compare November 15, 2019 21:43
Performance benchmarks can be found below. Obviously, a 8 KiB
request/response is tailored to showcase this improvement as this is
where codec buffer reuse shines, but I've run other benchmarks too (like
1-byte requests and responses) and there's no discernable impact on
performance.

We do not allow reuse of buffers when stat handlers or binlogs are
turned on. This is because those two may need access to the data and
payload even after the data has been written to the wire. In such cases,
we never return the data back to the pool.

A buffer reuse threshold of 1 KiB was determined after several
experiments. There's diminished returns when buffer reuse is enabled for
smaller messages (actually, a negative impact).

unary-networkMode_none-bufConn_false-keepalive_false-benchTime_40s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_6-reqSize_8192B-respSize_8192B-compressor_off-channelz_false-preloader_false
               Title       Before        After Percentage
            TotalOps       839638       906223     7.93%
             SendOps            0            0      NaN%
             RecvOps            0            0      NaN%
            Bytes/op    103788.29     80592.47   -22.35%
           Allocs/op       183.33       189.30     3.27%
             ReqT/op 1375662899.20 1484755763.20     7.93%
            RespT/op 1375662899.20 1484755763.20     7.93%
            50th-Lat    238.746µs    225.019µs    -5.75%
            90th-Lat    514.253µs    456.439µs   -11.24%
            99th-Lat    711.083µs    702.466µs    -1.21%
             Avg-Lat     285.45µs    264.456µs    -7.35%
encoding/encoding.go Outdated Show resolved Hide resolved
codec.go Outdated Show resolved Hide resolved
encoding/encoding.go Outdated Show resolved Hide resolved
encoding/proto/proto.go Show resolved Hide resolved
stream.go Outdated Show resolved Hide resolved
preloader.go Outdated Show resolved Hide resolved
preloader.go Show resolved Hide resolved
internal/leakcheck/leakcheck.go Outdated Show resolved Hide resolved
internal/transport/controlbuf.go Outdated Show resolved Hide resolved
server.go Outdated Show resolved Hide resolved
vet.sh Outdated Show resolved Hide resolved
Adhityaa Chandrasekar added 2 commits November 19, 2019 16:31
Copy link
Member

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments otherwise LGTM. Note that vet is failing (go linter) because of some comments.

internal/transport/transport.go Outdated Show resolved Hide resolved
internal/transport/transport.go Outdated Show resolved Hide resolved
stream.go Outdated Show resolved Hide resolved
@dfawley dfawley assigned adtac and unassigned dfawley Nov 22, 2019
@adtac
Copy link
Contributor Author

adtac commented Nov 23, 2019

@dfawley thanks for the review, I've addressed all comments. CI should pass this time. PTAL; if you'd like to merge this in the future, let me know and I'll squash everything into two logical commits -- one for the leaktest change, one for the actual reuse change.

@adtac adtac changed the title codec/proto: allow reuse of marshal byte buffers codec/proto: reuse of marshal byte buffers Dec 2, 2019
@menghanl menghanl assigned dfawley and unassigned adtac Dec 5, 2019
@dfawley dfawley modified the milestones: 1.26 Release, 1.27 Release Dec 17, 2019
@dfawley
Copy link
Member

dfawley commented Dec 18, 2019

FYI: travis is showing a data race that looks like it's caused by this PR. Can you take a look?

@adtac
Copy link
Contributor Author

adtac commented Dec 19, 2019

I literally can't reproduce this locally. I've run the failing test 8000 times (yes, 8000) with the race detector turned on and every single one test passed. And now Travis is green too. I don't want to say it's a memory corruption bug, but...

codec.go Outdated Show resolved Hide resolved
@dfawley
Copy link
Member

dfawley commented Dec 20, 2019

I literally can't reproduce this locally. I've run the failing test 8000 times (yes, 8000) with the race detector turned on and every single one test passed. And now Travis is green too. I don't want to say it's a memory corruption bug, but...

travis? 🤷‍♂️

@dfawley dfawley merged commit 6426751 into grpc:master Dec 20, 2019
@tmthrgd
Copy link

tmthrgd commented Dec 21, 2019

Using a sync.Pool in this way is almost certainly incorrect and liable to cause performance problems and excessive heap growth: golang/go#23199 and golang/go#27735.

menghanl added a commit to menghanl/grpc-go that referenced this pull request Jan 27, 2020
@menghanl menghanl mentioned this pull request Jan 27, 2020
@menghanl menghanl removed this from the 1.27 Release milestone Jan 27, 2020
menghanl added a commit that referenced this pull request Jan 27, 2020
* Revert "stream: fix returnBuffers race during retry (#3293)"

This reverts commit ede71d5.

* Revert "codec/proto: reuse of marshal byte buffers (#3167)"

This reverts commit 6426751.
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 25, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Type: Performance Performance improvements (CPU, network, memory, etc)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

stream: allow codecs to reuse output buffers
4 participants