Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IFU to v2.0.4 #14

Merged
merged 289 commits into from
Nov 3, 2023
Merged

IFU to v2.0.4 #14

merged 289 commits into from
Nov 3, 2023

Conversation

jayz0123
Copy link

@jayz0123 jayz0123 commented Sep 19, 2023

  • renamed "unpadded" -> "varlen" with mha_varlen_fwd & mha_varlen_bwd APIs.
  • changed mha_fwd & mha_bwd for input with the same sequence lengths in the same batch.
  • setup.py will install for either CUDA/ROCm system.
  • rename test_flash_attn -> test_flash_attn_rocm for ROCm unit test.
  • benchmark testing
  • Sync to PR bwd optimizing based on profiling #15
  • add unit tests for mha_fwd&mha_bwd
  • MQA/GQA
  • unit test all pass

Current Unit Test Result: (PyTorch 2.0.0; ROCm 5.6)
3968 passed, 63 skipped

Current Performance on MI250: (docker pull rocm/pytorch:rocm5.7_ubuntu22.04_py3.10_pytorch_2.0.1)

  fwd tflops bwd tflops total
fp16 52.16 39.93 42.49
bf16 52.36 30.25 34.21

tridao and others added 30 commits December 25, 2022 14:29
Follow xFormers's DISTPATCH_BOOL. Haven't tested it on Windows.
fixed cross attention typeerror
@jayz0123
Copy link
Author

A new environment variable "FLASH_ATTENTION_INTERNAL_ENABLE_TIME_KERNEL" can switch the output of kernel running time

@jayz0123
Copy link
Author

jayz0123 commented Oct 26, 2023

[BUGs] Previously in older version of FA, we create tensors z and softmax_lse matrix of max sequence lengths with no padding for grouped gemm. But the strides for each batch for the tensors are different. This behaviour will cause wrong result from CK. Fixing it.

@fsx950223
Copy link

Please remove *_hip.hpp

.gitignore Outdated
@@ -24,7 +24,10 @@ var/
.vscode/settings.

# Generated files
csrc/flash_attn_rocm/src/*hip*
Copy link

@fsx950223 fsx950223 Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to use *_hip.*?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense

Copy link

@fsx950223 fsx950223 Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, file names such as hip_flash_attention.cpp or hip_hacks.hpp are ignored too?

@sabreshao sabreshao merged commit edc7698 into flash_attention_for_rocm Nov 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.