New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Shashank/seq id flash attn #738

Merged

ShashankMosaicML merged 69 commits into mosaicml:main from ShashankMosaicML:shashank/seq_id_flash_attn

Dec 4, 2023

Contributor

ShashankMosaicML commented Nov 15, 2023 •

edited

Loading

This PR does three flash attention-related things:

Adds support for sequence id masking when using flash attention 2.1.2 or higher.
Gets rid of kv tensor repetition for grouped query attention when using flash attention 2.0.0 or higher.
Adds support for sliding window attention when using flash attention 2.3.0 or higher.

WandB link to the experiments: https://wandb.ai/mosaic-ml/seq_id_FA_final_tests

Loss and throughput curves for 125M model trained to chinchilla steps:

control is the main llmfoundry branch
treat is the Shashank/seq_id_flash_attn (the branch corresponding to this PR) with no config changes.
treat-seq-id-masking is the Shashank/seq_id_flash_attn (the branch corresponding to this PR) with sequence id masking turned on.
treat-sliding-window-100 is the Shashank/seq_id_flash_attn (the branch corresponding to this PR) with sliding window of size 100.

Screenshot 2023-12-01 at 9 37 53 AM

Screenshot 2023-12-01 at 9 38 48 AM

Screenshot 2023-12-01 at 9 43 01 AM

Screenshot 2023-12-01 at 9 43 44 AM

ShashankMosaicML and others added 26 commits

October 9, 2023 10:27


          Merge pull request #1 from mosaicml/main

04dd334

Pulling the latest commits from main fork


          Merge pull request #8 from mosaicml/main

87b2fdc

Pulling from the main repo


          Merge pull request #12 from mosaicml/main

c9a42e4

Pulling from mosaicml/llm-foundry main


          Merge branch 'mosaicml:main' into main

ddea9ee


          Merge pull request #13 from mosaicml/main

0bcd8ee

Merging from mosaic main


          Merge pull request #14 from mosaicml/main

f209b58

Pulling from mosaic main

..

879edb2

..

66d85f1

..

c1c2fbd

..

e8b9381

..

57502f4

..

3041bf6

..

9b305df

..

b5a3c1f

..

2e60014

..

bf946c7

..

e99780e

..

897edbc

..

88cf6d3

..

c0a4d97


          Merge pull request #15 from mosaicml/main

ec4378d

Pulling from mosaic main.


          merged and resolved conflicts

ad36ecf

..

c09d5c9

..

efc4f4e

..

4940e70

..

5e3ccf9

ShashankMosaicML marked this pull request as ready for review

November 16, 2023 17:26

ShashankMosaicML requested review from vchiley and dakinggg

November 16, 2023 17:30

vchiley reviewed

View reviewed changes

Contributor

vchiley left a comment

high level looks ok

in the pr description can you include figure showing mfu diff with and without masking and also figure showing convergence diff with and without masking

llmfoundry/models/layers/attention.py Outdated Show resolved Hide resolved

llmfoundry/models/layers/attention.py Outdated Show resolved Hide resolved

llmfoundry/models/mpt/configuration_mpt.py Show resolved Hide resolved

llmfoundry/models/mpt/configuration_mpt.py Outdated Show resolved Hide resolved

llmfoundry/models/mpt/modeling_mpt.py Outdated Show resolved Hide resolved

llmfoundry/models/mpt/modeling_mpt.py Outdated Show resolved Hide resolved

ShashankMosaicML and others added 2 commits

November 22, 2023 19:54

..

6a4f73e


          Merge branch 'main' into shashank/seq_id_flash_attn

f05bfe6

dakinggg reviewed

View reviewed changes

Collaborator

dakinggg left a comment

would be good to train some models to show equivalence of seq id with flash and other attention implementations

llmfoundry/models/layers/attention.py Outdated Show resolved Hide resolved

llmfoundry/models/mpt/modeling_mpt.py Show resolved Hide resolved

tests/test_flash_triton_torch.py Outdated Show resolved Hide resolved

tests/test_model.py Outdated Show resolved Hide resolved

tests/test_flash_attn.py Outdated Show resolved Hide resolved

tests/test_flash_attn.py Outdated Show resolved Hide resolved

llmfoundry/models/mpt/modeling_mpt.py Outdated Show resolved Hide resolved

llmfoundry/models/mpt/modeling_mpt.py Outdated Show resolved Hide resolved

llmfoundry/models/layers/attention.py Outdated Show resolved Hide resolved

llmfoundry/models/layers/attention.py Show resolved Hide resolved

ShashankMosaicML and others added 10 commits

November 25, 2023 22:58

..

c275365

..

e82c723

..

a964aea


          Merge branch 'main' into shashank/seq_id_flash_attn

..

..

4b25da2

..

67deef8

..

b855100

..

371e3a2


          Merge branch 'main' into shashank/seq_id_flash_attn

fa2a2ee

vchiley reviewed

View reviewed changes

llmfoundry/models/mpt/modeling_mpt.py Outdated Show resolved Hide resolved

ShashankMosaicML and others added 4 commits

December 1, 2023 09:45


          Merge branch 'main' into shashank/seq_id_flash_attn

8339cd3

..

6c59dce

..

805313b

..

f1251c4

ShashankMosaicML requested review from vchiley and dakinggg

December 1, 2023 21:14

ShashankMosaicML commented

View reviewed changes

llmfoundry/models/mpt/configuration_mpt.py Show resolved Hide resolved

llmfoundry/models/mpt/modeling_mpt.py Outdated Show resolved Hide resolved

tests/test_flash_attn.py Outdated Show resolved Hide resolved

tests/test_flash_attn.py Outdated Show resolved Hide resolved

tests/test_model.py Outdated Show resolved Hide resolved

llmfoundry/models/layers/attention.py Outdated Show resolved Hide resolved

llmfoundry/models/layers/attention.py Show resolved Hide resolved

llmfoundry/models/layers/attention.py Show resolved Hide resolved

llmfoundry/models/mpt/configuration_mpt.py Show resolved Hide resolved

llmfoundry/models/mpt/modeling_mpt.py Outdated Show resolved Hide resolved


          Merge branch 'main' into shashank/seq_id_flash_attn

5fca723

dakinggg approved these changes

View reviewed changes

Collaborator

dakinggg left a comment

LGTM, lets look into that slow test a bit before merging.

tests/test_flash_attn.py Outdated Show resolved Hide resolved

tests/test_flash_attn.py Outdated Show resolved Hide resolved

tests/test_flash_attn.py Outdated Show resolved Hide resolved

tests/test_model.py Outdated Show resolved Hide resolved

ShashankMosaicML and others added 5 commits

December 2, 2023 01:29


          merging from main

14a2553


          merging from main

cdc220f

..

e25ed63

..

cb6864a


          Merge branch 'main' into shashank/seq_id_flash_attn

9bc7ce1

ShashankMosaicML merged commit 84b5d96 into mosaicml:main

10 checks passed

ShashankMosaicML deleted the shashank/seq_id_flash_attn branch

December 4, 2023 17:37

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment