-
Notifications
You must be signed in to change notification settings - Fork 581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Pytorch support to perf.iree.dev #12537
Conversation
3d07c8a
to
c43dd52
Compare
Abbreviated Benchmark Summary@ commit eb4c7e7c6ed10d5e4149b11d9b9c9d467bfc5271 (no previous benchmark results to compare) Raw Latencies
[Top 3 out of 139 results showed] No improved or regressed compilation metrics 🏖️ For more information: |
# `ClipTextModel` encodes text into an embedding. | ||
# | ||
# Used in Stable Diffusion to convert a text prompt into an embedding for input to the `Unet2d` model. | ||
# | ||
# Converted from https://huggingface.co/docs/transformers/model_doc/clip#transformers.CLIPTextModel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this model pretrained / does it have real weights?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All models are pretrained and pulled from HuggingFace.
source_url= | ||
"https://storage.googleapis.com/iree-model-artifacts/pytorch/torch_models_20230307.103_1678163233/SD_CLIP_TEXT_MODEL_SEQLEN64/linalg.mlir", | ||
entry_function="forward", | ||
input_types=["1x77xi64", "1x77xi64"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we ever add correctness testing to our benchmark suites? If so, I'd expect some sort of "expected outputs" file (for iree-run-module, or iree-run-trace?) to also be hosted in the model artifacts bucket here. (We should be testing correctness for everything we benchmark, but we don't need to benchmark everything that we test for correctness)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for correctness testing. The model implementations in iree-samples
save the input and output arrays in .npy format. I believe there is an open item to add correctness testing here. @pzread
${PACKAGE_NAME}_iree-imported-model-d4a10c6d3e8a11d808baf398822ea8b61be07673517ff9be30fbe199b7fdd960 | ||
${PACKAGE_NAME}_iree-imported-model-a122dabcac56c201a4c98d3474265f15adba14bff88353f421b1a11cadcdea1f | ||
${PACKAGE_NAME}_model-9a9515c7-cb68-4c34-b1d2-0e8c0a3620b8 | ||
${PACKAGE_NAME}_model-340553d1-e6fe-41b6-b2c7-687c74ccec56 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these names different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's because they are the already-imported MLIR files instead of using an importer to import PyTorch models every time we build the benchmark suite. So there is no middle iree-imported-model-*
target to import the model.
For the reason we directly use MLIR files:
@mariecwhite discussed with me before. We think it's fine to directly use the imported MLIR for PyTorch benchmarks because:
- The script to import PyTorch models requires lots of extra dependencies. We probably don't want to ask people to install them when building IREE benchmark suite
- MLIR compatibility is the major reason we import TF/TFLite models in every build, but since we have no integration with
torch-mlir
in IREE repo, there is no benefit to importing PyTorch models to MLIR in every build. - I remembered Marie mentioned that the imported MLIR for PyTorch model is relatively stable. So it will only need to update the MLIR once a while when it breaks.
This is a mid-term solution before we can move the benchmark suite out of the IREE build system, then we can have extra dependencies to run torch-mlir for every build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to comments on #12017 (comment), if we're using artifacts using an unstable format they need to be stored with version information and the process for regenerating them needs to be clearly documented in a discoverable way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Version info is encoded in the GCS bucket hosting the mlir files: https://pantheon.corp.google.com/storage/browser/iree-model-artifacts/pytorch/torch_models_20230307.103_1678163233. 20230307.103
is the version of torch-mlir used. Inside the gcs bucket is a version_info.txt file that stores the output of pip list
. It may be difficult to exactly reproduce the mlir files from scratch on a different machine since the packages may no longer be hosted. If we want exact reproducibility, we'll need to create images with the pip environment saved.
In terms of encoding this version info in the dashboards and/or database, I'll defer to @pzread.
MODEL_BERT_LARGE_TF_FP32_SEQLEN384 = "8871f602-571c-4eb8-b94d-554cc8ceec5a" | ||
MODEL_CLIP_TEXT_SEQLEN64_FP32_TORCH = "9a9515c7-cb68-4c34-b1d2-0e8c0a3620b8" | ||
MODEL_UNET_2D_FP32_TORCH = "340553d1-e6fe-41b6-b2c7-687c74ccec56" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we group these by framework?
# Models
# TF
MODEL_FOO = ...
# TFLite
MODEL_BAR = ...
# PyTorch
MODEL_BAZ = ...
# Model input data
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grouped
torch_models.MODEL_CLIP_TEXT_SEQLEN64_FP32_TORCH, | ||
# Disabled due to https://github.com/openxla/iree/issues/11447. | ||
#torch_models.MODEL_UNET_2D_FP32_TORCH, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we tracking any "small" PyTorch models? It would be nice to have coverage for a variety of model architectures
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll be adding more PyTorch models in the coming weeks. I'll keep this in mind and make sure we add small torch models (probably an EfficientNet small since we have a TF version of this).
|
||
# Implementations of the models listed below can be found in `https://github.com/iree-org/iree-samples/tree/main/iree-torch/importer`. | ||
# We import the PyTorch models offline and make the .mlir available here for benchmarking. | ||
# If the mlir artifacts need to be updated, please run [update_torch_models.sh](https://github.com/iree-org/iree-samples/blob/main/iree-torch/importer/update_torch_models.sh) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The script at https://github.com/iree-org/iree-samples/blob/main/iree-torch/importer/update_torch_models.sh could use more documentation
- Sample command line showing expected usage (including any CLI args or env vars)
- Prerequisites (Linux only? pip install or build some packages from source first? auth for gcloud)
- What the script does (how long it takes, how much disk/compute it needs, etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created a PR with updates: iree-org/iree-experimental#111
I didn't include how long it takes and how much disk/compute it needs since it will change as we add more models.
c43dd52
to
47e81fa
Compare
Adds ClipTextModel and Unet2d to the benchmarking suite. benchmarks: x86_64, cuda
47e81fa
to
58a65e2
Compare
# Converted from https://huggingface.co/docs/diffusers/api/models#diffusers.UNet2DConditionModel | ||
MODEL_UNET_2D_FP32_TORCH = common_definitions.Model( | ||
id=unique_ids.MODEL_UNET_2D_FP32_TORCH, | ||
name="Unet2dPT", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will there be int8 or fp16 version of these models? If so maybe we can append _fp32
as other models. It's better to make sure model names are unique (even they are not the primary keys of benchmarks anymore)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This model already has FP32 in its name. Let's close on a naming convention and I'll update the model names in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Marie!
@@ -31,12 +32,12 @@ def generate( | |||
) -> Tuple[List[iree_definitions.ModuleGenerationConfig], | |||
List[iree_definitions.E2EModelRunConfig]]: | |||
"""Generates IREE compile and run configs.""" | |||
|
|||
models = model_groups.LARGE + [torch_models.MODEL_UNET_2D_FP32_TORCH] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we define this as a group instead? @pzread
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see in the RISC-V benchmarks for instance, we define a constant with the relevant models. Let's do that here too (just slightly more obvious than having it inline IMO):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the small, large model groups turn out not to be a good design to handle different model sets for different architectures. I'm still thinking about how to have a better organization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice, the model groups are tied heavily to the backend and what is supported in the frontend dialects so it might make sense to group based on backend.
Adds ClipTextModel and Unet2d to the benchmarking suite. benchmarks: x86_64, cuda
iree-org#12693 is merged after iree-org#12690 and iree-org#12537 without regenerating the cmake files (the presubmit was passed before the other two PRs merged). Regenerated the cmake files.
Adds ClipTextModel and Unet2d to the benchmarking suite. benchmarks: x86_64, cuda
Adds ClipTextModel and Unet2d to the benchmarking suite. benchmarks: x86_64, cuda
iree-org#12693 is merged after iree-org#12690 and iree-org#12537 without regenerating the cmake files (the presubmit was passed before the other two PRs merged). Regenerated the cmake files.
Adds ClipTextModel and Unet2d to the benchmarking suite.
benchmarks: x86_64, cuda