Split benchamrk suite build target into default and long-running #13129

pzread · 2023-04-17T21:18:07Z

Split iree-benchmark-suites into iree-benchmark-suites and iree-benchmark-suites-long. The former one doesn't include the long-running benchmarks and the latter one only includes those.

The long-running benchmarks take > 100GB to build and waste lots of GCS space (we upload them) and time in presubmit.

This change still build all targets in build_e2e_test_artifacts. The follow-up changes will only build iree-benchmark-suites by default in presubmit.

github-actions · 2023-04-17T23:57:03Z

Abbreviated Benchmark Summary

@ commit 869eb873461b3b8f836de4c7f0cc67294194504d (vs. base a345bfa696b520fc65c7743a94af6d61276a2576)

Regressed Latencies 🚩

Benchmark Name	Average Latency (ms)	Median Latency (ms)	Latency Standard Deviation (ms)
MobileNetV3Small\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags] local\_task(embedded\_elf)[4-thread,full-inference,system-scheduling] with zeros @ pixel-6-pro[little-core]	25.184 (vs. 21.994, 14.50%↑)	25.384	0.886
BertForMaskedLMTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with zeros @ c2-standard-16[cpu]	451.777 (vs. 395.203, 14.32%↑)	452.648	6.183
MobileNetV3Small\_fp32(tflite) [qualcomm-adreno-vulkan\_android31-vulkan\_spirv][default-flags] vulkan(none)[full-inference,default-flags] with zeros @ moto-edge-x30[gpu]	5.025 (vs. 4.686, 7.25%↑)	5.090	0.412

[Top 3 out of 5 results showed]

Improved Latencies 🎉

Benchmark Name	Average Latency (ms)	Median Latency (ms)	Latency Standard Deviation (ms)
MobileNetV3Small\_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags] local\_task(vmvx\_module)[4-thread,full-inference,system-scheduling] with zeros @ pixel-6-pro[big-core]	1018.159 (vs. 1239.801, 17.88%↓)	1014.282	21.102
MobileNetV2\_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags] local\_task(vmvx\_module)[4-thread,full-inference,system-scheduling] with zeros @ pixel-6-pro[big-core]	5003.644 (vs. 5851.031, 14.48%↓)	5047.260	145.482
PoseNet\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags] local\_sync(embedded\_elf)[full-inference,default-flags] with zeros @ pixel-4[big-core]	273.181 (vs. 300.340, 9.04%↓)	272.799	1.888

[Top 3 out of 9 results showed]

No improved or regressed compilation metrics 🏖️

For more information:

Source Workflow Run

pzread · 2023-05-18T19:06:52Z

build_tools/python/benchmark_suites/iree/cuda_benchmarks.py

-            imported_model=iree_definitions.ImportedModel.from_model(model))
-        for model in models
+            imported_model=iree_definitions.ImportedModel.from_model(model),
+            tags=tags) for model in models


Instead of using tags, I think we should have a field "presets" on module generation and run configs to indicate which benchmark preset they belong to. I'm planning to do some overhaul changes later to refactor these (#13683)

Here I'm trying to implement with what we have now to unblock #13581

Hmmm but that means each model only belongs to exactly one preset

The presets can be a list, so each benchmark can still belong to multiple presets, if needed

GMNGeoffrey

Uuuuuh are we uploading 100GB per run on postsubmit also? Because then we're going to have to start looking at storage costs, not just latency.

ScottTodd · 2023-05-18T21:06:21Z

Uuuuuh are we uploading 100GB per run on postsubmit also? Because then we're going to have to start looking at storage costs, not just latency.

Also mentioned/discussed a bit here: #12215 (comment)

GMNGeoffrey · 2023-05-18T21:04:34Z

build_tools/python/benchmark_suites/iree/cuda_benchmarks.py

-            imported_model=iree_definitions.ImportedModel.from_model(model))
-        for model in models
+            imported_model=iree_definitions.ImportedModel.from_model(model),
+            tags=tags) for model in models


Hmmm but that means each model only belongs to exactly one preset

GMNGeoffrey · 2023-05-18T21:05:24Z

build_tools/python/benchmark_suites/iree/vulkan_nvidia_benchmarks.py

@@ -88,15 +88,15 @@ def generate(
  ) -> Tuple[List[iree_definitions.ModuleGenerationConfig],
             List[iree_definitions.E2EModelRunConfig]]:
    """Generates IREE compile and run configs."""
-    # The `vulkan-nvidia` tag is required to put them into the Vulkan NVIDIA
+    # The `vulkan-nvidia`` tag is required to put them into the Vulkan NVIDIA


Suggested change

# The `vulkan-nvidia`` tag is required to put them into the Vulkan NVIDIA

# The `vulkan-nvidia` tag is required to put them into the Vulkan NVIDIA

Revert extra backtick?

GMNGeoffrey · 2023-05-18T21:07:04Z

build_tools/python/benchmark_suites/iree/benchmark_tags.py

+COMPILE_STATS = "compile-stats"
+
+# Tag for long-running benchmarks.
+LONG_RUNNING = "long-running"


Especially when speaking about compilation of the models, is "long running" really the right name? That to me implies execution time. It seems like "large" would be simpler, shorter, and more accurate

Talked to @mariecwhite, I think we can rename this to large . I can do in a follow-up change.

benvanik · 2023-05-18T21:13:30Z

(reminder that unfoldable_constant exists and there are other mechanisms we can use to strip/compress weights - we shouldn't really be using real weights on presubmits unless we're 100% confident there's a material difference)

pzread · 2023-05-18T21:25:03Z

Uuuuuh are we uploading 100GB per run on postsubmit also? Because then we're going to have to start looking at storage costs, not just latency.

We can also not build the large benchmarks in postsubmit (or not upload them), if the cost is a concern.

…e-org#13129) Split `iree-benchmark-suites` into `iree-benchmark-suites` and `iree-benchmark-suites-long`. The former one doesn't include the `long-running` benchmarks and the latter one only includes those.

pzread added benchmarks:cuda Run default CUDA benchmarks benchmarks:x86_64 Run default x86_64 benchmarks benchmarks:comp-stats Run default compilation statistics benchmarks labels Apr 17, 2023

pzread force-pushed the bench-split-target branch from 7fb4f72 to 76a9004 Compare April 18, 2023 17:46

pzread force-pushed the bench-split-target branch 4 times, most recently from a3135d6 to 9e8ba25 Compare May 15, 2023 20:11

pzread changed the title ~~[WIP] Split benchamrk suites into default and long-running~~ Split benchamrk suites into default and long-running May 15, 2023

pzread force-pushed the bench-split-target branch 4 times, most recently from 5fd8e1f to e1252da Compare May 15, 2023 21:44

pzread mentioned this pull request May 16, 2023

Add CPU benchmarks to benchmark-long workflow #13581

Merged

pzread removed the benchmarks:cuda Run default CUDA benchmarks label May 17, 2023

pzread force-pushed the bench-split-target branch 4 times, most recently from 0d3373e to 09c4c2c Compare May 18, 2023 19:01

pzread commented May 18, 2023

View reviewed changes

pzread added benchmarks:cuda Run default CUDA benchmarks benchmarks:android-cpu Run default Android CPU benchmarks benchmarks:android-gpu Run default Android GPU benchmarks benchmarks:vulkan-nvidia Run default Vulkan benchmarks on NVIDIA GPU labels May 18, 2023

pzread marked this pull request as ready for review May 18, 2023 19:19

pzread requested review from GMNGeoffrey and antiagainst as code owners May 18, 2023 19:19

pzread changed the title ~~Split benchamrk suites into default and long-running~~ Split benchamrk suite build target into default and long-running May 18, 2023

GMNGeoffrey reviewed May 18, 2023

View reviewed changes

GMNGeoffrey approved these changes May 18, 2023

View reviewed changes

Che-Yu Wu added 2 commits May 23, 2023 15:51

Create long-running benchmark targets

6517313

Refactor

70d345f

pzread force-pushed the bench-split-target branch from 09c4c2c to 70d345f Compare May 23, 2023 15:51

pzread merged commit 43305f7 into iree-org:main May 23, 2023

pzread mentioned this pull request Jun 2, 2023

Rename long-running to large in benchmark suite and workflows #13914

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split benchamrk suite build target into default and long-running #13129

Split benchamrk suite build target into default and long-running #13129

pzread commented Apr 17, 2023 •

edited by GMNGeoffrey

Loading

github-actions bot commented Apr 17, 2023 •

edited

Loading

pzread May 18, 2023 •

edited

Loading

GMNGeoffrey May 18, 2023

pzread May 18, 2023

GMNGeoffrey left a comment

ScottTodd commented May 18, 2023

GMNGeoffrey May 18, 2023

GMNGeoffrey May 18, 2023

GMNGeoffrey May 18, 2023

pzread May 23, 2023

benvanik commented May 18, 2023

pzread commented May 18, 2023

	# The `vulkan-nvidia`` tag is required to put them into the Vulkan NVIDIA
	# The `vulkan-nvidia` tag is required to put them into the Vulkan NVIDIA

Split benchamrk suite build target into default and long-running #13129

Split benchamrk suite build target into default and long-running #13129

Conversation

pzread commented Apr 17, 2023 • edited by GMNGeoffrey Loading

github-actions bot commented Apr 17, 2023 • edited Loading

Abbreviated Benchmark Summary

Regressed Latencies 🚩

Improved Latencies 🎉

pzread May 18, 2023 • edited Loading

Choose a reason for hiding this comment

GMNGeoffrey May 18, 2023

Choose a reason for hiding this comment

pzread May 18, 2023

Choose a reason for hiding this comment

GMNGeoffrey left a comment

Choose a reason for hiding this comment

ScottTodd commented May 18, 2023

GMNGeoffrey May 18, 2023

Choose a reason for hiding this comment

GMNGeoffrey May 18, 2023

Choose a reason for hiding this comment

GMNGeoffrey May 18, 2023

Choose a reason for hiding this comment

pzread May 23, 2023

Choose a reason for hiding this comment

benvanik commented May 18, 2023

pzread commented May 18, 2023

pzread commented Apr 17, 2023 •

edited by GMNGeoffrey

Loading

github-actions bot commented Apr 17, 2023 •

edited

Loading

pzread May 18, 2023 •

edited

Loading