add test configurations to run Torch compile (#95) #155

aliabdelkader · 2024-03-12T12:28:16Z

add test configurations for cuda inference using torch compile

IlyasMoutawwakil · 2024-03-13T11:44:05Z

Hi @aliabdelkader,
Thanks a lot for adding these tests.
I don't think there's any interest in adding bettertransformer at this point since scaled_doct_product_attention is being integrated directly into transformers. Also it's also a fairly simple integration (model.to_bettertransformer) so testing it will only slow down the CI.
torch.compile on the other hand is interesting since it depends on the library, but I don't think it's compatible with language modeling's generate method right now (it recompiles the model for every token).

IlyasMoutawwakil · 2024-03-15T08:20:44Z

it seems there's an error with diffusers config. tbh I've never tested compiling diffusers pipelines, did you test it locally ?

aliabdelkader · 2024-03-15T16:14:57Z

yes, I did. But, to be honest in a fairly limited environment basically colab/kaggle notebook. I have seen that error before. But, I mistakenly thought it was a limitation in that environment setup.

Now, I think that inductor/Triton is attempting to benchmark different kernels on the gpu to pick the fastest kernel configuration. One of those launches is producing that illegal memory access error.

I am not an expert. but, it seems that this behavior from inductor is disabled by forcing pytorch / cudnn to be deterministic. by setting torch.use_deterministic_algorithms() to true and the environment variable CUBLAS_WORKSPACE_CONFIG to :16:8 or :4096:8. I think this would limit the performance of gpu. I am not sure if there are other implications to this.

I gave that a try locally. it seems to have removed the error. Would like me to make the change or remove the test ?

IlyasMoutawwakil · 2024-03-18T08:27:58Z

@aliabdelkader thank you for looking this up,
I'm investigating this blog https://pytorch.org/blog/accelerating-generative-ai-3/ where they mention setting some some inductor configs

- remove cpu / cuda test configuration that use gpt and torch compile - add test configuration to run inference benchmack with timm model and torch compile on cuda gpu

- enabling conv_1x1_as_mm option removes cuda illegal memory access error that occurs during the tiny stable diffusion model's compilation on CI machine or nvidia T4 gpu

aliabdelkader · 2024-03-21T15:42:58Z

Hi @IlyasMoutawwakil

Thanks for the blog post's link.

I have checked the options mentioned there. I found that setting the mode to max-autotune or setting the inductor option conv_1x1_as_mm to true are enough to resolve that illegal memory access error. To be honest, I do not why the error gets resolved with those configurations.

I decided make a commit with conv_1x1_as_mm instead of max-autotune because I saw that autotune takes a bit of time. I thought it would be better for the CI if I commit the faster option.

I hope that was somewhat helpful. Please let me know what you think.

IlyasMoutawwakil · 2024-03-21T15:52:13Z

thanks a lot for this optimal fix !

… rocm 5.7 - starting pytorch nightly version 2.3, inductor uses a trition-rocm compiler version that attempts to load rocm 6. Therefore, the torch_compile tests fail with pytorch 2.3+ and rocm 5.7

aliabdelkader · 2024-03-22T12:29:02Z

Hi @IlyasMoutawwakil

I checked the cli rocm pytorch test failure. The test was failing because there is a missing library libamdhip64.so.6 which should be installed by rocm 6.

It seems that inductor (triton compiler) in the pytorch nightly version 2.3+ wants to load rocm 6.

I removed the torch_compile tests from the workflow file for the pytorch nightly and rocm5.7 image.

But, that feels like a workaround. Do you think there is a better way of handling it ?

IlyasMoutawwakil · 2024-03-22T12:32:05Z

@aliabdelkader thanks for the fix, could we just set torch_pre_release to 0 for rocm5.7, that way we use the stable torch release.

… failure - In pytorch nightly starting version 2.3, inductor uses a triton compiler that attempts to load rocm 6 which was casuing torch_compile tests to fail. - remove pytorch nightly from cli rocm pytorch workflow. - Revert "disable torch_compile tests for pytorch nightly version 2.3+ with amd rocm 5.7" commit

aliabdelkader · 2024-03-22T13:23:07Z

@IlyasMoutawwakil yes I think that would work. I pushed that change.

aliabdelkader force-pushed the add_optimizations_tests branch from 49c3fad to 3963db8 Compare March 14, 2024 13:21

aliabdelkader changed the title ~~add test configurations to run BetterTransformer and Torch compile (#95)~~ add test configurations to run Torch compile (#95) Mar 14, 2024

IlyasMoutawwakil closed this Mar 18, 2024

IlyasMoutawwakil reopened this Mar 18, 2024

aliabdelkader added 3 commits March 21, 2024 16:28

add test configurations to run torch compile

6bd2f4f

remove test configurations that use gpt and torch compile together

9baed53

- remove cpu / cuda test configuration that use gpt and torch compile - add test configuration to run inference benchmack with timm model and torch compile on cuda gpu

enable conv_1x1_as_mm option for torch inductor in diffusers tests

54a1c09

- enabling conv_1x1_as_mm option removes cuda illegal memory access error that occurs during the tiny stable diffusion model's compilation on CI machine or nvidia T4 gpu

aliabdelkader force-pushed the add_optimizations_tests branch from 3963db8 to 54a1c09 Compare March 21, 2024 15:20

disable torch_compile tests for pytorch nightly version 2.3+ with amd…

9205183

… rocm 5.7 - starting pytorch nightly version 2.3, inductor uses a trition-rocm compiler version that attempts to load rocm 6. Therefore, the torch_compile tests fail with pytorch 2.3+ and rocm 5.7

IlyasMoutawwakil merged commit 9141f5b into huggingface:main Mar 22, 2024
21 of 22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add test configurations to run Torch compile (#95) #155

add test configurations to run Torch compile (#95) #155

aliabdelkader commented Mar 12, 2024 •

edited

Loading

IlyasMoutawwakil commented Mar 13, 2024

IlyasMoutawwakil commented Mar 15, 2024

aliabdelkader commented Mar 15, 2024

IlyasMoutawwakil commented Mar 18, 2024 •

edited

Loading

aliabdelkader commented Mar 21, 2024

IlyasMoutawwakil commented Mar 21, 2024

aliabdelkader commented Mar 22, 2024

IlyasMoutawwakil commented Mar 22, 2024 •

edited

Loading

aliabdelkader commented Mar 22, 2024

add test configurations to run Torch compile (#95) #155

add test configurations to run Torch compile (#95) #155

Conversation

aliabdelkader commented Mar 12, 2024 • edited Loading

IlyasMoutawwakil commented Mar 13, 2024

IlyasMoutawwakil commented Mar 15, 2024

aliabdelkader commented Mar 15, 2024

IlyasMoutawwakil commented Mar 18, 2024 • edited Loading

aliabdelkader commented Mar 21, 2024

IlyasMoutawwakil commented Mar 21, 2024

aliabdelkader commented Mar 22, 2024

IlyasMoutawwakil commented Mar 22, 2024 • edited Loading

aliabdelkader commented Mar 22, 2024

aliabdelkader commented Mar 12, 2024 •

edited

Loading

IlyasMoutawwakil commented Mar 18, 2024 •

edited

Loading

IlyasMoutawwakil commented Mar 22, 2024 •

edited

Loading