Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Framing benchmarks and optimizations #1016

Merged

Conversation

stebet
Copy link
Contributor

@stebet stebet commented Feb 11, 2021

Proposed Changes

Added benchmarks for frame generation and added optimizations which should benefit all commands sent from the clients.

Types of Changes

  • Bug fix (non-breaking change which fixes issue #NNNN)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause an observable behavior change in existing systems)
  • Documentation improvements (corrections, new content, etc)
  • Cosmetic change (whitespace, formatting, etc)

Checklist

  • I have read the CONTRIBUTING.md document
  • I have signed the CA (see https://cla.pivotal.io/sign/rabbitmq)
  • All tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)
  • Any dependent changes have been merged and published in related repositories

Further Comments

Main optimizations were done by reducing method calls, adding inlining and reducing code branches. There is definitely room for more improvements, but this is a pretty solid start :)

Benchmark results for serializing BasicAck, BasicDeliver and ChannelClose to their actual frames:

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.21277
Intel Core i7-10700 CPU 2.90GHz, 1 CPU, 16 logical and 8 physical cores
.NET Core SDK=5.0.103
  [Host]     : .NET Core 3.1.12 (CoreCLR 4.700.21.6504, CoreFX 4.700.21.6905), X64 RyuJIT
  .NET 4.8 = .NET Framework 4.8 (4.8.4261.0), X64 RyuJIT
  Core 3.1 = .NET Core 3.1.12 (CoreCLR 4.700.21.6504, CoreFX 4.700.21.6905), X64 RyuJIT

Before

Method Runtime Mean Error StdDev Code Size Gen 0 Gen 1 Gen 2 Allocated
ChannelCloseWrite .NET 4.8 82.41 ns 0.699 ns 0.653 ns 1817 B 0.0088 - - 56 B
ChannelCloseWrite Core 3.1 31.48 ns 0.350 ns 0.327 ns 1117 B 0.0067 - - 56 B
BasicPublishWrite .NET 4.8 225.96 ns 0.905 ns 0.755 ns 1817 B 0.0241 - - 152 B
BasicPublishWrite Core 3.1 116.96 ns 0.684 ns 0.640 ns 1117 B 0.0181 - - 152 B
BasicPublishMemoryWrite .NET 4.8 162.78 ns 1.383 ns 1.226 ns 1817 B 0.0241 - - 152 B
BasicPublishMemoryWrite Core 3.1 71.53 ns 0.239 ns 0.200 ns 1117 B 0.0181 - - 152 B
BasicAckWrite .NET 4.8 70.30 ns 0.504 ns 0.472 ns 1817 B 0.0088 - - 56 B
BasicAckWrite Core 3.1 27.09 ns 0.508 ns 0.450 ns 1117 B 0.0067 - - 56 B

After:

Method Runtime Mean Diff Error StdDev Code Size Gen 0 Gen 1 Gen 2 Allocated
ChannelCloseWrite .NET 4.8 69.72 ns -15,4% 0.440 ns 0.411 ns 759 B 0.0088 - - 56 B
ChannelCloseWrite Core 3.1 23.36 ns -25,8% 0.482 ns 0.691 ns 1195 B 0.0067 - - 56 B
BasicPublishWrite .NET 4.8 200.58 ns -11,2% 0.687 ns 0.574 ns 2814 B 0.0241 - - 152 B
BasicPublishWrite Core 3.1 98.16 ns -16,1% 0.381 ns 0.356 ns 2438 B 0.0181 - - 152 B
BasicPublishMemoryWrite .NET 4.8 133.69 ns -17,9% 0.336 ns 0.314 ns 2942 B 0.0241 - - 152 B
BasicPublishMemoryWrite Core 3.1 55.57 ns -22,3% 0.503 ns 0.470 ns 2438 B 0.0181 - - 152 B
BasicAckWrite .NET 4.8 58.54 ns -16,7% 0.245 ns 0.229 ns 759 B 0.0089 - - 56 B
BasicAckWrite Core 3.1 18.06 ns -33.3% 0.381 ns 0.558 ns 1195 B 0.0067 - - 56 B

@bollhals
Copy link
Contributor

Nice improvements! Looks very much worth it.

Some of your improvements I also applied in #962, so we should look how we coordinate these two PRs to get the maximum out of it 🔥 🚀

Out of curiosity, while changing it, what change did bring the most benefit? Is shifting & then write as ulong that much faster?

@stebet
Copy link
Contributor Author

stebet commented Feb 11, 2021

The UInt64 change helped get rid of method calls and calls to Span.Slice, which even if they are cheap, add up quickly on a hot-path, made the Write* calls smaller (Code size) which made them more inlineable.

The biggest factor was splitting up OutgoingCommand to reduce code branches (ifs) and extra method calls.

I have an idea for a few more optimizations which I'll try out tomorrow.

@stebet stebet force-pushed the framingbenchmarksandoptimizations branch from e08aa7f to 3d920cd Compare February 12, 2021 14:47
@stebet
Copy link
Contributor Author

stebet commented Feb 12, 2021

Rebased and squashed after reviews.

@stebet stebet force-pushed the framingbenchmarksandoptimizations branch from 6f9ddb0 to e02e86f Compare February 12, 2021 14:50
@stebet
Copy link
Contributor Author

stebet commented Feb 12, 2021

Updated with latest perf numbers

@stebet
Copy link
Contributor Author

stebet commented Feb 18, 2021

Anyone want to do a final review of this? If not, I think this PR is ready :) @michaelklishin @danielmarbach @bollhals

@bollhals
Copy link
Contributor

Go for it. I'll update #962 afterwards to pickup these improvements and extend where possible.

@michaelklishin michaelklishin merged commit 9437b15 into rabbitmq:master Feb 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants