Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add test configurations to run Torch compile (#95) #155

Merged

Conversation

aliabdelkader
Copy link
Contributor

@aliabdelkader aliabdelkader commented Mar 12, 2024

  • add test configurations for cuda inference using torch compile

@IlyasMoutawwakil
Copy link
Member

Hi @aliabdelkader,
Thanks a lot for adding these tests.
I don't think there's any interest in adding bettertransformer at this point since scaled_doct_product_attention is being integrated directly into transformers. Also it's also a fairly simple integration (model.to_bettertransformer) so testing it will only slow down the CI.
torch.compile on the other hand is interesting since it depends on the library, but I don't think it's compatible with language modeling's generate method right now (it recompiles the model for every token).

@aliabdelkader aliabdelkader changed the title add test configurations to run BetterTransformer and Torch compile (#95) add test configurations to run Torch compile (#95) Mar 14, 2024
@IlyasMoutawwakil
Copy link
Member

it seems there's an error with diffusers config. tbh I've never tested compiling diffusers pipelines, did you test it locally ?

@aliabdelkader
Copy link
Contributor Author

yes, I did. But, to be honest in a fairly limited environment basically colab/kaggle notebook. I have seen that error before. But, I mistakenly thought it was a limitation in that environment setup.

Now, I think that inductor/Triton is attempting to benchmark different kernels on the gpu to pick the fastest kernel configuration. One of those launches is producing that illegal memory access error.

I am not an expert. but, it seems that this behavior from inductor is disabled by forcing pytorch / cudnn to be deterministic. by setting torch.use_deterministic_algorithms() to true and the environment variable CUBLAS_WORKSPACE_CONFIG to :16:8 or :4096:8. I think this would limit the performance of gpu. I am not sure if there are other implications to this.

I gave that a try locally. it seems to have removed the error. Would like me to make the change or remove the test ?

@IlyasMoutawwakil
Copy link
Member

IlyasMoutawwakil commented Mar 18, 2024

@aliabdelkader thank you for looking this up,
I'm investigating this blog https://pytorch.org/blog/accelerating-generative-ai-3/ where they mention setting some some inductor configs

- remove cpu / cuda test configuration that use gpt and torch compile
- add test configuration to run inference benchmack with timm model and torch compile on cuda gpu
- enabling conv_1x1_as_mm option removes cuda illegal memory access error that occurs during the tiny stable diffusion model's compilation on CI machine or nvidia T4 gpu
@aliabdelkader
Copy link
Contributor Author

Hi @IlyasMoutawwakil

Thanks for the blog post's link.

I have checked the options mentioned there. I found that setting the mode to max-autotune or setting the inductor option conv_1x1_as_mm to true are enough to resolve that illegal memory access error. To be honest, I do not why the error gets resolved with those configurations.

I decided make a commit with conv_1x1_as_mm instead of max-autotune because I saw that autotune takes a bit of time. I thought it would be better for the CI if I commit the faster option.

I hope that was somewhat helpful. Please let me know what you think.

@IlyasMoutawwakil
Copy link
Member

thanks a lot for this optimal fix !

… rocm 5.7

- starting pytorch nightly version 2.3, inductor uses a trition-rocm compiler version that attempts to load rocm 6. Therefore, the torch_compile tests fail with pytorch 2.3+ and rocm 5.7
@aliabdelkader
Copy link
Contributor Author

Hi @IlyasMoutawwakil

I checked the cli rocm pytorch test failure. The test was failing because there is a missing library libamdhip64.so.6 which should be installed by rocm 6.

It seems that inductor (triton compiler) in the pytorch nightly version 2.3+ wants to load rocm 6.

I removed the torch_compile tests from the workflow file for the pytorch nightly and rocm5.7 image.

But, that feels like a workaround. Do you think there is a better way of handling it ?

@IlyasMoutawwakil
Copy link
Member

IlyasMoutawwakil commented Mar 22, 2024

@aliabdelkader thanks for the fix, could we just set torch_pre_release to 0 for rocm5.7, that way we use the stable torch release.

… failure

- In pytorch nightly starting version 2.3, inductor uses a triton compiler that attempts to load rocm 6 which was casuing torch_compile tests to fail.
- remove pytorch nightly from cli rocm pytorch  workflow.
- Revert "disable torch_compile tests for pytorch nightly version 2.3+ with amd rocm 5.7" commit
@aliabdelkader
Copy link
Contributor Author

@IlyasMoutawwakil yes I think that would work. I pushed that change.

@IlyasMoutawwakil IlyasMoutawwakil merged commit 9141f5b into huggingface:main Mar 22, 2024
21 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants