Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split benchamrk suite build target into default and long-running #13129

Merged
merged 2 commits into from
May 23, 2023

Conversation

pzread
Copy link
Contributor

@pzread pzread commented Apr 17, 2023

Split iree-benchmark-suites into iree-benchmark-suites and iree-benchmark-suites-long. The former one doesn't include the long-running benchmarks and the latter one only includes those.

The long-running benchmarks take > 100GB to build and waste lots of GCS space (we upload them) and time in presubmit.

This change still build all targets in build_e2e_test_artifacts. The follow-up changes will only build iree-benchmark-suites by default in presubmit.

@pzread pzread added benchmarks:cuda Run default CUDA benchmarks benchmarks:x86_64 Run default x86_64 benchmarks benchmarks:comp-stats Run default compilation statistics benchmarks labels Apr 17, 2023
@github-actions
Copy link

github-actions bot commented Apr 17, 2023

Abbreviated Benchmark Summary

@ commit 869eb873461b3b8f836de4c7f0cc67294194504d (vs. base a345bfa696b520fc65c7743a94af6d61276a2576)

Regressed Latencies 🚩

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
MobileNetV3Small\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags] local\_task(embedded\_elf)[4-thread,full-inference,system-scheduling] with zeros @ pixel-6-pro[little-core] 25.184 (vs. 21.994, 14.50%↑) 25.384 0.886
BertForMaskedLMTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with zeros @ c2-standard-16[cpu] 451.777 (vs. 395.203, 14.32%↑) 452.648 6.183
MobileNetV3Small\_fp32(tflite) [qualcomm-adreno-vulkan\_android31-vulkan\_spirv][default-flags] vulkan(none)[full-inference,default-flags] with zeros @ moto-edge-x30[gpu] 5.025 (vs. 4.686, 7.25%↑) 5.090 0.412

[Top 3 out of 5 results showed]

Improved Latencies 🎉

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
MobileNetV3Small\_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags] local\_task(vmvx\_module)[4-thread,full-inference,system-scheduling] with zeros @ pixel-6-pro[big-core] 1018.159 (vs. 1239.801, 17.88%↓) 1014.282 21.102
MobileNetV2\_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags] local\_task(vmvx\_module)[4-thread,full-inference,system-scheduling] with zeros @ pixel-6-pro[big-core] 5003.644 (vs. 5851.031, 14.48%↓) 5047.260 145.482
PoseNet\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags] local\_sync(embedded\_elf)[full-inference,default-flags] with zeros @ pixel-4[big-core] 273.181 (vs. 300.340, 9.04%↓) 272.799 1.888

[Top 3 out of 9 results showed]

No improved or regressed compilation metrics 🏖️

For more information:

Source Workflow Run

@pzread pzread force-pushed the bench-split-target branch 4 times, most recently from a3135d6 to 9e8ba25 Compare May 15, 2023 20:11
@pzread pzread changed the title [WIP] Split benchamrk suites into default and long-running Split benchamrk suites into default and long-running May 15, 2023
@pzread pzread force-pushed the bench-split-target branch 4 times, most recently from 5fd8e1f to e1252da Compare May 15, 2023 21:44
@pzread pzread removed the benchmarks:cuda Run default CUDA benchmarks label May 17, 2023
@pzread pzread force-pushed the bench-split-target branch 4 times, most recently from 0d3373e to 09c4c2c Compare May 18, 2023 19:01
imported_model=iree_definitions.ImportedModel.from_model(model))
for model in models
imported_model=iree_definitions.ImportedModel.from_model(model),
tags=tags) for model in models
Copy link
Contributor Author

@pzread pzread May 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using tags, I think we should have a field "presets" on module generation and run configs to indicate which benchmark preset they belong to. I'm planning to do some overhaul changes later to refactor these (#13683)

Here I'm trying to implement with what we have now to unblock #13581

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm but that means each model only belongs to exactly one preset

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The presets can be a list, so each benchmark can still belong to multiple presets, if needed

@pzread pzread added benchmarks:cuda Run default CUDA benchmarks benchmarks:android-cpu Run default Android CPU benchmarks benchmarks:android-gpu Run default Android GPU benchmarks benchmarks:vulkan-nvidia Run default Vulkan benchmarks on NVIDIA GPU labels May 18, 2023
@pzread pzread marked this pull request as ready for review May 18, 2023 19:19
@pzread pzread changed the title Split benchamrk suites into default and long-running Split benchamrk suite build target into default and long-running May 18, 2023
Copy link
Contributor

@GMNGeoffrey GMNGeoffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uuuuuh are we uploading 100GB per run on postsubmit also? Because then we're going to have to start looking at storage costs, not just latency.

@ScottTodd
Copy link
Member

Uuuuuh are we uploading 100GB per run on postsubmit also? Because then we're going to have to start looking at storage costs, not just latency.

Also mentioned/discussed a bit here: #12215 (comment)

imported_model=iree_definitions.ImportedModel.from_model(model))
for model in models
imported_model=iree_definitions.ImportedModel.from_model(model),
tags=tags) for model in models
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm but that means each model only belongs to exactly one preset

@@ -88,15 +88,15 @@ def generate(
) -> Tuple[List[iree_definitions.ModuleGenerationConfig],
List[iree_definitions.E2EModelRunConfig]]:
"""Generates IREE compile and run configs."""
# The `vulkan-nvidia` tag is required to put them into the Vulkan NVIDIA
# The `vulkan-nvidia`` tag is required to put them into the Vulkan NVIDIA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# The `vulkan-nvidia`` tag is required to put them into the Vulkan NVIDIA
# The `vulkan-nvidia` tag is required to put them into the Vulkan NVIDIA

Revert extra backtick?

COMPILE_STATS = "compile-stats"

# Tag for long-running benchmarks.
LONG_RUNNING = "long-running"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially when speaking about compilation of the models, is "long running" really the right name? That to me implies execution time. It seems like "large" would be simpler, shorter, and more accurate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked to @mariecwhite, I think we can rename this to large . I can do in a follow-up change.

@benvanik
Copy link
Collaborator

(reminder that unfoldable_constant exists and there are other mechanisms we can use to strip/compress weights - we shouldn't really be using real weights on presubmits unless we're 100% confident there's a material difference)

@pzread
Copy link
Contributor Author

pzread commented May 18, 2023

Uuuuuh are we uploading 100GB per run on postsubmit also? Because then we're going to have to start looking at storage costs, not just latency.

We can also not build the large benchmarks in postsubmit (or not upload them), if the cost is a concern.

@pzread pzread merged commit 43305f7 into iree-org:main May 23, 2023
NatashaKnk pushed a commit to NatashaKnk/iree that referenced this pull request Jul 6, 2023
…e-org#13129)

Split `iree-benchmark-suites` into `iree-benchmark-suites` and
`iree-benchmark-suites-long`. The former one doesn't include the
`long-running` benchmarks and the latter one only includes those.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmarks:android-cpu Run default Android CPU benchmarks benchmarks:android-gpu Run default Android GPU benchmarks benchmarks:comp-stats Run default compilation statistics benchmarks benchmarks:cuda Run default CUDA benchmarks benchmarks:vulkan-nvidia Run default Vulkan benchmarks on NVIDIA GPU benchmarks:x86_64 Run default x86_64 benchmarks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants