Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UX improvements for the new benchmark pipeline #12215

Closed
10 of 19 tasks
pzread opened this issue Feb 15, 2023 · 4 comments
Closed
10 of 19 tasks

UX improvements for the new benchmark pipeline #12215

pzread opened this issue Feb 15, 2023 · 4 comments
Assignees
Labels
infrastructure/benchmark Relating to benchmarking infrastructure infrastructure Relating to build systems, CI, or testing

Comments

@pzread
Copy link
Contributor

pzread commented Feb 15, 2023

Since we move to use artificial ids in the new benchmark pipeline, naturally it isn't very friendly for users to figure out the details of benchmarks with those meaningless ids.

To improve the UX, some information can be added to the benchmark config files and some tools can help users to retrieve the benchmark information.

P0

P1

  • Benchmark display name should be generated independently from run_benchmarks_on_*.py (Use benchmark name from benchmark config object #12723)
  • Support more x86_64 architecture for local developments (e.g. CPU w/o AVX512)
  • Export the benchmark config when building the e2e_test_artifacts

P2

  • Improve the readability of the serialized format
  • More structural filters in benchmark tools
  • Derive CPU uarch from device sepc
  • More clear and concise benchmark summary comment
  • Show timestamp on the benchmark summary
  • Docs to explain the structure of the new python benchmark framework
  • Docs for how the new benchmark suite structure and how to hack it
@pzread pzread added infrastructure Relating to build systems, CI, or testing infrastructure/benchmark Relating to benchmarking infrastructure labels Feb 15, 2023
@pzread pzread self-assigned this Feb 15, 2023
@pzread
Copy link
Contributor Author

pzread commented Mar 15, 2023

Dump some thoughts about the tracability of using SHA ID and naming issues on artifacts and benchmarks

Backgrounds

E2E Test Artifacts

${IREE_BUILD_DIR}/e2e_test_artifacts is the central place to store all fetched
files and artifacts built in generated_e2e_test_iree_artifacts.cmake. The
script generate_cmake_e2e_test_artifacts_suite.py collects all
ModuleGenerationConfig (right now only from the benchmark suite) and converts
them into cmake build rules.

Each ModuleGenerationConfig has an unique SHA256 ID computed from its
sub-components (CompileConfig and ImportedModel). This ID is used as the target
and output file name in the build rules.

IREE Benchmark Suites

There are two types of benchmarks: execution and compilation. Execution
benchmark is defined by E2EModelRunConfig and compilation benchmark is defined
by ModuleGenerationConfig (note that only some generation configs are
compilation benchmarks, we just reuse the same object from module artifact
generation). Each benchmark has its permanent SHA256 ID.

To run the benchmarks, export_benchmark_config.py exports the configs of
serialized E2EModelRunConfig and ModuleGenerationConfig for the benchmark
tools. The config file contains the full objects and their sub-components,
mainly for indexing and searching benchmarks by their metadata in the tools and
in the upcomming database.

Issues and Solutions

Tracability in E2E Test Artifacts

Since the build rules and output artifacts only have the SHA256 ID in the names,
it is very hard to backtrack which Python code generates rules. Right now you
can only guess from the flags, or write code to print out the ID when a
ModuleGenerationConfig is generated.

Using SHA256 ID as cmake target and file names has advantage of avoiding the
needs to escape or replace the disallowed characters, fixed length of names, and
collision free. However not being able to backtrack the sources of rules is a
big flaw for debugging the build errors.

Solutions

To make them trackable, first ModuleGenerationConfig needs to contain
information about where it is generated. A simple way is giving a readable name
for each config. The name can be auto-generated from its sub-components, such as
model name and the tags of compile config. This approach also aligns with the
requirements to generate IREE benchmark names from the config. The readable name
need to contain the model name, target architecture name, and compile tags.

The second step is to annotate the names on build rules and output artifacts. We
can reconsider to combine the readable name with SHA256 as the file name of
artifacts, but this will run into the problems of long file name and disallowed
characters (although they should be solvable). Another simple option is adding
the name either as a comment on each cmake build rule or in the FRIENDLY_NAME
argument of iree_bytecode_module rule. This should at least let people find
the readable name when searching the SHA256 build target name in
generated_e2e_test_iree_artifacts.cmake or build log.

Searchablilty and Tracability in IREE Benchmark Suites

The same problem also exists in the IREE benchmark suites, as the benchmark ID
is only a SHA256 and the serialized format of config file, although contains all
metadata, is not easy to read.

In addition, we decided not to dump all execution benchmark flagfiles into the
e2e_test_artifacts directory as we thought the number of files would grow
quickly since it's N x number of module artifacts where N ~= the number of threads + experiments. The current N is around 3 (~330 execution benchmarks in
total). And it's not necessary when we can have all benchmarks listed in a
single JSON config file.

However, the format of config is not really readable becuase the serializer uses
ID to reference the nested objects (in order to dedup the common objects). For
example:

"iree_e2e_model_run_configs:<execution_benchmark_id_1>": {
  "module_generation_config": <id to its module generation config>
  "run_flags": [...]
},
"iree_e2e_model_run_configs:<execution_benchmark_id_2>": {
  "module_generation_config": <id to its module generation config>
  "run_flags": [...]
},
"iree_module_generation_configs:<compilation_benchmark_id_3>": {
  "imported_model": <id to its imported model>
  "compile_flags": [...]
},
"iree_module_generation_configs:<compilation_benchmark_id_4>": {
  "imported_model": <id to its imported model>
  "compile_flags": [...]
},
"iree_imported_models:<imported_model_id>": {
  "model": <id to its model>
},
"models:<model_id>": {
  "name": <model name>,
  ...
},
...

The format requires people to back-and-forth search with IDs to gather all
sub-components and their metadata of a benchmark. Naturally we prefer a more
straightforward layout like:

{
  "e2e_model_run_configs": {
    "<execution_benchmark_id>": {
      "run_flags": [...],
      "tags": [...],
      "module_generation_config": {
        "compile_flags": [...],
        "tags": [...],
        "imported_model": {
          "import_flags": [...],
          "model": {
            "name": <model_name>,
            "tags": [...],
          }
        }
      },
    }
  },
  ...
}
Solutions

We can consider dump all compilation and run flags into a single file with benchmark names, tags, and ids. So it is easier to use text editors to search benchmarks by keywords.

We might also need to re-evaluate the decision not to dump all run flags. 300~400 flag files might not be that problematic.

@ScottTodd
Copy link
Member

The new improvements are great, especially the docs (https://github.com/openxla/iree/blob/main/docs/developers/developing_iree/benchmark_suites.md#3-fetch-the-benchmark-artifacts) and summaries (e.g. https://github.com/openxla/iree/actions/runs/4832269569#summary-13105536036).

One thing I'd like is a way to see how large the files are and filter to only download "small" programs. When I run gcloud storage cp "%E2E_TEST_ARTIFACTS_DIR_URL%/*.mlir [destination path], that starts to fetch ~54GB of files. I could let that download complete, but what I'm usually looking for is a snapshot of several real programs and if I wanted the large programs I wouldn't mind running a different command. (I also wonder if we're actually storing 50+GB of data for each benchmark run... are those files compressed or deduplicated in the storage system?)

This was working well for me before (266MB):
image

I'm sure there's a way to do that with the gcloud CLI, so consider this a selfish request for an alternate command to copy/paste, a script to run, or some directory structure changes :p

@pzread
Copy link
Contributor Author

pzread commented May 9, 2023

Yeah, it's unhealthy to upload 50GB of artifacts to the GCS for every presubmit commit. It's a new problem after we added a new long-running category with larger models to the benchmark suite. I think the first step is to create separate build targets so we don't build the "long-running" benchmarks in the presubmit (or even in the postsubmit, since we only run them nightly).

A trick to only download small files can be:

gcloud storage ls -l "gs://iree-github-actions-presubmit-artifacts/4927200698/1/e2e-test-artifacts/*.mlir*" | sort -h

then pipe it to some bash tools that could give you a list of smaller files. I'll update the doc if I find something suitable (or feel free to update with your commands). (And I also realized we are now uploading *.mlir and *.mlirbc, so probably need to update the docs...)

Another way could be putting TF/TFLite into the file name, that should allow you to download only TFLite models, which are usually smaller.

@pzread
Copy link
Contributor Author

pzread commented Jan 23, 2024

Major improvements are considered to be done

@pzread pzread closed this as completed Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infrastructure/benchmark Relating to benchmarking infrastructure infrastructure Relating to build systems, CI, or testing
Projects
No open projects
Status: In Progress
Development

No branches or pull requests

2 participants