Adding support for alibi when using flash attention #820

ShashankMosaicML · 2023-12-23T18:57:43Z

ALiBi support has recently been added to FlashAttention2 (PR#540) and is stable with flash-attention=v2.4.2. This PR allows for ALiBi slopes to be passed to flash attention, hence enabling ALiBi for Flash Attention.

Experiments on 125m and 1b models show (nearly) identical training curves for ALiBi with Triton attention (which is the current default attention implementation when using ALiBi) and ALiBi with Flash Attention. Further, using Flash Attention results in higher MFU numbers. In the plots below, 'treat' refers to flash attention, and 'control' refers to triton attention.

Wandb experiment link

Note: Flash Attention seems to use more memory than Triton:

These changes would effectively also enable the use of sliding window attention and memory efficient sequence id masking with ALiBi when using Flash Attention.

As an aside, @Skylion007 has been able to successfully update his own version of MosaicBERT with FA2, and has found it to indeed be faster than FA1: w&b link

Pulling the latest commits from main fork

Pulling from the main repo

Pulling from mosaicml/llm-foundry main

Merging from mosaic main

Pulling from mosaic main

Pulling from mosaic main.

Skylion007 · 2024-01-02T18:41:29Z

FYI, I added a BERT FA2 PR to the examples repo: mosaicml/examples#440

irenedea

Thanks for doing this! Added some questions and comments.

tests/models/layers/test_flash_attn.py

llmfoundry/models/mpt/modeling_mpt.py

llmfoundry/models/layers/attention.py

tests/models/layers/test_flash_attn.py

tests/models/test_model.py

jacobfulano

This looks good to me! Could we potentially include an example yaml in scripts/train/pretrain with flash attention 2 as well?

vchiley · 2024-01-04T20:45:33Z

This looks good to me! Could we potentially include an example yaml in scripts/train/pretrain with flash attention 2 as well?

if you install llm-foundry using pip install -e . [gpu-flash2] (instead of pip install -e . [gpu]) and set `attn_impl: flash, MPT will uses FA2.

dakinggg

lgtm, will leave approval for @irenedea

tests/models/layers/test_flash_attn.py

Co-authored-by: Irene Dea <deaairene@gmail.com>

tests/models/layers/test_flash_attn.py

Co-authored-by: Irene Dea <deaairene@gmail.com>

ShashankMosaicML and others added 30 commits October 9, 2023 10:27

Merge pull request #1 from mosaicml/main

04dd334

Pulling the latest commits from main fork

Merge pull request #8 from mosaicml/main

87b2fdc

Pulling from the main repo

Merge pull request #12 from mosaicml/main

c9a42e4

Pulling from mosaicml/llm-foundry main

Merge branch 'mosaicml:main' into main

ddea9ee

Merge pull request #13 from mosaicml/main

0bcd8ee

Merging from mosaic main

Merge pull request #14 from mosaicml/main

f209b58

Pulling from mosaic main

Merge pull request #15 from mosaicml/main

ec4378d

Pulling from mosaic main.

Merge branch 'mosaicml:main' into main

b436706

..

bcace03

Merge branch 'mosaicml:main' into main

cf4aa58

Merge branch 'mosaicml:main' into main

7c35ce8

..

0a8ebfb

..

6f18a33

Merge branch 'mosaicml:main' into main

f42d585

Merge branch 'mosaicml:main' into main

2f3f53c

..

77b975f

..

8420c3a

..

1bbe200

..

aa62d96

..

626a24c

Merge branch 'main' into shashank/alibi_flash_attn

630090c

..

1bb0a60

..

61cbd3f

..

de3680d

..

d7dbab5

..

a9da986

..

25e543a

..

8bd045d

..

59dd9c5

..

c98c959

..

b4a7752

ShashankMosaicML changed the title ~~Adding support for alibi in flash attention~~ Adding support for alibi when using flash attention Jan 2, 2024

..

b4142aa

irenedea reviewed Jan 4, 2024

View reviewed changes

jacobfulano approved these changes Jan 4, 2024

View reviewed changes

dakinggg reviewed Jan 4, 2024

View reviewed changes

Merge branch 'main' into shashank/alibi_flash_attn

7c415e3

irenedea reviewed Jan 4, 2024

View reviewed changes

tests/models/layers/test_flash_attn.py Outdated Show resolved Hide resolved

irenedea reviewed Jan 4, 2024

View reviewed changes

tests/models/layers/test_flash_attn.py Outdated Show resolved Hide resolved

Shashank Rajput and others added 7 commits January 5, 2024 03:31

..

44eb30f

merging

c0501cc

Update tests/models/layers/test_flash_attn.py

f4f90d3

Co-authored-by: Irene Dea <deaairene@gmail.com>

..

be2ec37

Merge branch 'main' into shashank/alibi_flash_attn

9733a27

..

41469da

merging

ef7a08e

irenedea approved these changes Jan 5, 2024

View reviewed changes

irenedea reviewed Jan 5, 2024

View reviewed changes

tests/models/layers/test_flash_attn.py Outdated Show resolved Hide resolved

ShashankMosaicML and others added 2 commits January 5, 2024 11:00

Update tests/models/layers/test_flash_attn.py

fd411e4

Co-authored-by: Irene Dea <deaairene@gmail.com>

..

a033a35

dakinggg approved these changes Jan 5, 2024

View reviewed changes

ShashankMosaicML merged commit d991f37 into mosaicml:main Jan 5, 2024
10 checks passed

ShashankMosaicML deleted the shashank/alibi_flash_attn branch January 5, 2024 21:40

rickgit16 mentioned this pull request Jun 18, 2024

MPT training with ALiBi and Flash Attention 2 #1289

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding support for alibi when using flash attention #820

Adding support for alibi when using flash attention #820

ShashankMosaicML commented Dec 23, 2023 •

edited

Loading

Skylion007 commented Jan 2, 2024

irenedea left a comment

jacobfulano left a comment

vchiley commented Jan 4, 2024

dakinggg left a comment

Adding support for alibi when using flash attention #820

Adding support for alibi when using flash attention #820

Conversation

ShashankMosaicML commented Dec 23, 2023 • edited Loading

Skylion007 commented Jan 2, 2024

irenedea left a comment

Choose a reason for hiding this comment

jacobfulano left a comment

Choose a reason for hiding this comment

vchiley commented Jan 4, 2024

dakinggg left a comment

Choose a reason for hiding this comment

ShashankMosaicML commented Dec 23, 2023 •

edited

Loading