Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility of flash attention 2 and type conversion due to accelerator.prepare #30009

Closed
bellos1203 opened this issue Apr 3, 2024 · 2 comments

Comments

@bellos1203
Copy link

Hi, I'm trying to fine-tune my model, which is BLIP-2, using flash attention 2 on OPT 2.7B, but using FA2 produces significantly higher loss than using eager attention mode, which seems similar to issues reported previously (#26498, #28925, #28142).
From the comments from those issues, the best way to use fa2 normally is to load the model in full precision and train the model with autocast context.
However, when using accelerate library, accelerator.prepare function converts the model into a specified dtype (for me, bf16) including layer norm.
I guess this caused the problem for me, but I'm not sure.

Could you check this behavior and give any suggestions? I'm using transformers==4.40.0.dev0, accelerate==0.23.0 and flash_attn==2.5.5.
Or if there is any more detail that I have to elaborate on, please let me know.
Thanks in advance :)

@ArthurZucker
Copy link
Collaborator

Hey 🤗 thanks for opening an issue! We try to keep the github issues for bugs/feature requests.
Could you ask your question on the forum instead? I'm sure the community will be of help!

Also think you should be able to prevent accelerate from preparing to a different dtype!
Thanks!

@bellos1203
Copy link
Author

bellos1203 commented Apr 6, 2024

Ok, I posted my question on the forum! Thanks for your reply :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants