Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO) benchmark results #74

Closed
zamazan4ik opened this issue May 13, 2024 · 1 comment

Comments

@zamazan4ik
Copy link

Hi!

Recently I tested Profile-Guided Optimization (PGO) compiler optimization on different projects in different software domains - all the results are available at https://github.com/zamazan4ik/awesome-pgo . Since PGO shows measurable improvements in many cases, I decided to perform PGO benchmarks on this library (especially because I found some performance numbers). Here are my results - I hope they will be helpful for someone.

Test environment

  • Fedora 39
  • Linux kernel 6.8.9
  • AMD Ryzen 9 5900x
  • 48 Gib RAM
  • SSD Samsung 980 Pro 2 Tib
  • Compiler - Rustc 1.78.0
  • prettyplease version: the latest for now from the master branch on commit 179974cc93c8d54894483463c1eea4df9e70a694
  • Disabled Turbo boost

Benchmark

For benchmark purposes, I use built-in into the project's test scenario with reformatting several projects (described below). I used this tool since I needed an executable to optimize. As benchmark datasets, I used two projects: Vector and grafbase (they both are big enough. Additionally, they were already checked out to my PC): Vector on the master branch with 5a4a2b2a10131af7ef4ca32ff13b9040e231f5a6 commit, grafbase on the main branch, 5605d62f69790f62a385e8155bddf838f977165b commit. For PGO optimization I use the cargo-pgo tool.

Release bench result I got with taskset -c 0 prettyplease-update command. The PGO training phase is done with taskset -c 0 prettyplease-update with the instrumented binary, PGO optimization phase - with taskset -c 0 prettyplease-update. taskset -c 0 is used for reducing the OS scheduler influence on the results. All measurements are done on the same machine, with the same background "noise" (as much as I can guarantee).

Also, I decided to test LTO as well on the project. I enabled LTO support with the following lines in the root Cargo.toml:

[profile.release]
codegen-units = 1
lto = true

Results

I got the following results on formatting the grafbase sources. PGO training set - Vector sources:

hyperfine --warmup 5 --min-runs 10 'taskset -c 0 ../prettyplease/target/update_release' 'taskset -c 0 ../prettyplease/target/update_release_lto' 'taskset -c 0 ../prettyplease/target/update_release_lto_pgo_instrumented' 'taskset -c 0 ../prettyplease/target/update_release_lto_pgo_optimized' 'taskset -c 0 ../prettyplease/target/update_release_lto_pgo_optimized_bolt_instrumented' 'taskset -c 0 ../prettyplease/target/update_release_lto_pgo_optimized_bolt_optimized'
Benchmark 1: taskset -c 0 ../prettyplease/target/update_release
  Time (mean ± σ):     775.1 ms ±   2.2 ms    [User: 695.3 ms, System: 76.7 ms]
  Range (min … max):   770.1 ms … 778.0 ms    10 runs

Benchmark 2: taskset -c 0 ../prettyplease/target/update_release_lto
  Time (mean ± σ):     677.3 ms ±   2.2 ms    [User: 597.9 ms, System: 76.7 ms]
  Range (min … max):   674.4 ms … 680.2 ms    10 runs

Benchmark 3: taskset -c 0 ../prettyplease/target/update_release_lto_pgo_instrumented
  Time (mean ± σ):     869.7 ms ±   7.8 ms    [User: 776.9 ms, System: 83.1 ms]
  Range (min … max):   863.3 ms … 887.2 ms    10 runs

Benchmark 4: taskset -c 0 ../prettyplease/target/update_release_lto_pgo_optimized
  Time (mean ± σ):     631.0 ms ±   2.3 ms    [User: 543.7 ms, System: 79.3 ms]
  Range (min … max):   627.1 ms … 635.1 ms    10 runs

Benchmark 5: taskset -c 0 ../prettyplease/target/update_release_lto_pgo_optimized_bolt_instrumented
  Time (mean ± σ):      1.489 s ±  0.006 s    [User: 1.221 s, System: 0.238 s]
  Range (min … max):    1.479 s …  1.503 s    10 runs

Benchmark 6: taskset -c 0 ../prettyplease/target/update_release_lto_pgo_optimized_bolt_optimized
  Time (mean ± σ):     623.4 ms ±   3.7 ms    [User: 529.9 ms, System: 85.6 ms]
  Range (min … max):   618.8 ms … 630.6 ms    10 runs

Summary
  taskset -c 0 ../prettyplease/target/update_release_lto_pgo_optimized_bolt_optimized ran
    1.01 ± 0.01 times faster than taskset -c 0 ../prettyplease/target/update_release_lto_pgo_optimized
    1.09 ± 0.01 times faster than taskset -c 0 ../prettyplease/target/update_release_lto
    1.24 ± 0.01 times faster than taskset -c 0 ../prettyplease/target/update_release
    1.40 ± 0.02 times faster than taskset -c 0 ../prettyplease/target/update_release_lto_pgo_instrumented
    2.39 ± 0.02 times faster than taskset -c 0 ../prettyplease/target/update_release_lto_pgo_optimized_bolt_instrumented

, where:

  • update_release: Release
  • update_release_lto: Release + LTO
  • update_release_lto_pgo_optimized: Release + LTO + PGO
  • update_release_lto_pgo_optimized_bolt_optimized: Release + LTO + PGO + BOLT
  • (just for reference) update_release_lto_pgo_instrumented: Release + LTO + PGO instrumentation
  • (just for reference) update_release_lto_pgo_optimized_bolt_instrumented: Release + LTO + PGO + BOLT instrumentation

According to the results, LTO and PGO measurably improve performance at least in the simple benchmark above. BOLT also improves performance but the improvement wasn't huge.

Further steps

I can suggest the following action points:

  • Perform more PGO benchmarks with other datasets (if you are interested enough). If it shows improvements - add a note to the documentation (the README file?) about possible improvements in the library's performance with PGO.
  • Probably, you can try to get some insights about how the code can be optimized further based on the changes that the compiler performed with PGO. It can be done via analyzing flamegraphs before and after applying PGO to understand the difference. I don't think that anything valuable for this library can improved in this way, though.

I would be happy to answer your questions about PGO and PLO.

P.S. Please do not treat the issue like a bug or something like that. Since the "Discussions" functionality is disabled in this repo, I created the Issue instead.

@dtolnay
Copy link
Owner

dtolnay commented May 13, 2024

Thanks!

@dtolnay dtolnay closed this as completed May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants