Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow flash attention 2 and upgrade to transformers 4.34.1 #672

Merged
merged 30 commits into from
Oct 24, 2023

Conversation

dakinggg
Copy link
Collaborator

@dakinggg dakinggg commented Oct 13, 2023

This PR does a few things that come together:

  • Upgrades to transformers 4.34
  • Small compatibility fixes for 4.34
  • Transformers 4.34 breaks when flash attn <2 is installed, so some monkeypatching is added to get around that
  • Transformers 4.34 adds support for using flash attention 2 with some models, so we add support for that arg
  • Tests for the new mistral model are added
  • Adds documentation about the new flash attention options
  • test coverage for all iterations of flash version, patch type, flash2, etc
  • Wait for 4.34.1 and pin to that (and remove some of the above patches)
  • Manual tests for llama and mistral. confirm that both train with reasonable loss curves, and model print out shows using flash attention 2
  • Wait for Upgrade to transformers 4.34.1 composer#2635 to be merged (doesn't need to be released, just want to make sure we have confidence in the new transformers version)

@dakinggg dakinggg marked this pull request as ready for review October 22, 2023 19:27
Copy link
Contributor

@irenedea irenedea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good! just had a few q's

tests/test_huggingface_flash.py Outdated Show resolved Hide resolved
scripts/train/train.py Show resolved Hide resolved
scripts/train/README.md Outdated Show resolved Hide resolved
scripts/train/README.md Show resolved Hide resolved
Copy link
Contributor

@irenedea irenedea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm! thank you! just a couple of super minor things

@dakinggg dakinggg changed the title Allow flash attention 2 and upgrade to transformers 4.34 Allow flash attention 2 and upgrade to transformers 4.34.1 Oct 24, 2023
@dakinggg dakinggg enabled auto-merge (squash) October 24, 2023 17:51
@dakinggg dakinggg merged commit d72902a into mosaicml:main Oct 24, 2023
12 checks passed
@dakinggg dakinggg deleted the tr34-flash2 branch December 11, 2023 23:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants