Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile-Guided Optimization (PGO) evaluation #1918

Open
zamazan4ik opened this issue Sep 12, 2023 · 3 comments
Open

Profile-Guided Optimization (PGO) evaluation #1918

zamazan4ik opened this issue Sep 12, 2023 · 3 comments
Labels
documentation Improvements or additions to documentation

Comments

@zamazan4ik
Copy link

Hi!

I did a lot of Profile-Guided Optimization (PGO) benchmarks recently on different kinds of software - all currently available results are located at https://github.com/zamazan4ik/awesome-pgo . According to the tests, PGO usually helps with achieving better performance. That's why testing PGO would be a good idea for Hurl. I did some benchmarks on my local machine and want to share my results.

Test environment

  • Apple Macbook M1 (full charge, AC-connected)
  • macOS 13.4 Ventura
  • Rust: 1.72
  • Latest hurl from the master branch (commit 7ed25baac9934a1a86f61a8f4bedcdc76dbaa2a2 )

Test workload

As a test scenario, I used benches from the repo. The only differences are increased request count (10k) and using an Axum-based HTTP server instead of Flask (because on my machine this Flask-based server is overloaded with these benchmarks and stuck in a few moments).

All runs are performed on the same hardware, operating system, and the same background workload (as much as I can guarantee this on macOS). All PGO optimizations are done with cargo-pgo. The profile information was collected from the benchmarks as well.

Results

Here are the results in Hyperfine format, where the PGO-optimized binary is compared to the Release binary (PGO-optimized is faster according to the results below):

Benchmark 1: /Users/zamazan4ik/open_source/hurl/target/aarch64-apple-darwin/release/hurl tests/hello_10000.hurl
  Time (mean ± σ):      3.260 s ±  0.111 s    [User: 1.471 s, System: 0.666 s]
  Range (min … max):    3.106 s …  3.459 s    20 runs

Benchmark 2: /Users/zamazan4ik/open_source/hurl/target/release/hurl tests/hello_10000.hurl
  Time (mean ± σ):      3.726 s ±  0.282 s    [User: 1.783 s, System: 0.707 s]
  Range (min … max):    3.410 s …  4.505 s    20 runs

Summary
  /Users/zamazan4ik/open_source/hurl/target/aarch64-apple-darwin/release/hurl tests/hello_10000.hurl ran
    1.14 ± 0.10 times faster than /Users/zamazan4ik/open_source/hurl/target/release/hurl tests/hello_10000.hurl

Some conclusions

  • PGO shows great improvements in hurl performance at least in the provided by the project's benchmarks. I think the same results can be estimated for other cases.

Further steps

I can suggest to do the following things:

  • Add a note to the Hurl documentation about building with PGO. In this case, users and maintainers who build their own Hurl binaries will be aware of PGO as an additional way to optimize the project.
  • Optimize provided by Hurl project binaries on the CI (like it's already done for other projects like Rustc), if any.
  • Try to evaluate LLVM BOLT in addition to PGO on Hurl.
@jcamiel
Copy link
Collaborator

jcamiel commented Jan 10, 2024

Hi @zamazan4ik could you propose us a small text for the documentation about "Add a note to the Hurl documentation about building with PGO"

@zamazan4ik
Copy link
Author

Hi @zamazan4ik could you propose us a small text for the documentation about "Add a note to the Hurl documentation about building with PGO"

Sure!

Firstly, I want to share with you existing PGO-oriented documentation in other projects:

I hope you can find something useful in the examples above.

About suggesting a small text about PGO, I suggest you answer the following questions in this text:

  • What is PGO? A link to the Rustc documentation should be enough, IMHO
  • What benefits does PGO bring to Hurl? Here we can reference this issue with actual benchmarks
  • How to build Hurl with PGO? Here we can write a simple instruction for building Hurl with PGO via cargo-pgo or with raw compiler PGO-related options (Rustc documentation)

Where to put this instruction? I guess somewhere in "Building from sources" documentation.

So I think the text could look like this (as a reference I used Vector documentation about PGO):

"Profile-Guided Optimization (PGO) is a compiler optimization technique where a program is optimized based on the runtime profile.

According to the tests, we see improvements of up to 20% faster request executions in the benchmark. The performance benefits depend on your typical workload - you can get better or worse results.

More information about PGO in Hurl can be found in the corresponding GitHub issue.

How to build Hurl with PGO?

There are two major kinds of PGO: Instrumentation and Sampling (also known as AutoFDO). In this guide, is described the Instrumentation PGO with Hurl. We use cargo-pgo for building Hurl with PGO.

  • Install cargo-pgo.
  • Check out the Hurl repository.
  • Go to the Hurl source directory and run cargo pgo build. It will build the instrumented Hurl version.
  • Run instrumented Hurl on your test load. Usually, performing several workload-representative requests is enough to collect a good PGO profile (but your case can be different).
  • Run cargo pgo optimize. It will build Hurl with PGO optimization.

A more detailed guide on how to apply PGO is in the Rust documentation."

I think having something like this in the documentation is fine.

@jcamiel
Copy link
Collaborator

jcamiel commented Jan 11, 2024

Thanks a lot, I'll put all this in the repo under "contrib", and link it in the documentation!

@jcamiel jcamiel added the documentation Improvements or additions to documentation label Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants