Optionally use `flash-attn`'s CE loss for metrics #3394

snarayan21 · 2024-06-11T22:56:40Z

What does this PR do?

Resubmission of #3214 -- using FA's CE Loss results in lower peak reserved memory usage and higher throughput. We are not adding flash attention as an optional dependency to composer since this makes installs and correct builds messy & take a lot longer.

Fixed a small typo where the torch 3.11 CPU tests were using the GPU image with flash attn installed by accident.

Also modified DeviceGPU class so that it instantiates a gloo backend for CPU tensors, if gloo is available. This handles cases where users may want to perform distributed operations with tensors present on CPU even if they are using GPUs.

Manual tests:

Run started on dev (13b-dense-fsdp-fullshard-hsdp-adam-shardedckpt-start-5PtEdK), resumed with this branch (13b-dense-fsdp-fullshard-hsdp-adam-shardedckpt-resume-E5SieL)
Run started on this branch (13b-dense-fsdp-fullshard-hsdp-adam-shardedckpt-start-0g8uD4), resumed with dev branch (13b-dense-fsdp-fullshard-hsdp-adam-shardedckpt-resume-TSGoUC)

4th time's the charm :0

Run with torch CE loss (green): tiny-sp-dtms1-32h-wCFWfa
Run with FA CE loss (tan): tiny-sp-dtms1-32h-jOfIPL

What issue(s) does this change relate to?

Before submitting

Have you read the contributor guidelines?
Is this change a documentation change or typo fix? If so, skip the rest of this checklist.
Was this change discussed/approved in a GitHub issue first? It is much more likely to be merged if so.
Did you update any related docs and document your change?
Did you update any related tests and add any new tests related to your change? (see testing)
Did you run the tests locally to make sure they pass?
Did you run pre-commit on your change? (see the pre-commit section of prerequisites)

dakinggg

Holding review until after freeze

mvpatel2000

Can we offer a flag to gate as well? IIRC there are occasionally numerics issues for long seq...

@ShashankMosaicML do u remember

ShashankMosaicML · 2024-06-14T20:05:33Z

Can we offer a flag to gate as well? IIRC there are occasionally numerics issues for long seq...

@ShashankMosaicML do u remember

Flash attention fixed the long seq issue in this PR: Dao-AILab/flash-attention@c79de85

dakinggg

will review once CI passes

…into saaketh/fa_ce_loss merging origin

snarayan21 · 2024-06-14T22:39:13Z

Seeing the error below on CPU tests:

>       assert input.is_cuda and target.is_cuda, "Only support CUDA tensors"
E       AssertionError: Only support CUDA tensors

So i'm gonna add a check for torch.cuda.is_available()

snarayan21 · 2024-06-14T22:57:18Z

jk. The torch 3.11 cpu tests were using the cuda image on accident, causing this problem. It was only the torch 3.11 tests too. Fixed that in this PR as well.

dakinggg

lgtm

dakinggg

add unit tests for this before merging please

mvpatel2000

holding till offline discussion

mvpatel2000

LGTM. Noting that changing the GPU backend to provide both mirrors PyTorch's default behavior, which initializes both a GPU and a CPU dist backend.

…into saaketh/fa_ce_loss yo

snarayan21 · 2024-06-17T19:09:51Z

Added manual test names to PR description

This reverts commit 2cf9262. revert dat boi

…3408) This reverts commit 2cf9262. revert dat boi

* Revert "Optionally use `flash-attn`'s CE loss for metrics (#3394)" This reverts commit 2cf9262. revert dat boi * remove * slamm

* yo * slam * cuda * cuda checks * test * fix_test * gloo * gloo * lint * lint --------- Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com> Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>

)" (mosaicml#3408) This reverts commit 2cf9262. revert dat boi

* Revert "Optionally use `flash-attn`'s CE loss for metrics (mosaicml#3394)" This reverts commit 2cf9262. revert dat boi * remove * slamm

* yo * slam * cuda * cuda checks * test * fix_test * gloo * gloo * lint * lint --------- Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com> Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>

…3408) This reverts commit 2cf9262. revert dat boi

* Revert "Optionally use `flash-attn`'s CE loss for metrics (#3394)" This reverts commit 2cf9262. revert dat boi * remove * slamm

yo

e2678b9

snarayan21 requested a review from a team as a code owner June 11, 2024 22:56

snarayan21 requested review from mvpatel2000 and dakinggg June 11, 2024 22:57

dakinggg requested changes Jun 11, 2024

View reviewed changes

slam

502b828

mvpatel2000 reviewed Jun 12, 2024

View reviewed changes

Merge branch 'dev' into saaketh/fa_ce_loss

d61d7f0

Merge branch 'dev' into saaketh/fa_ce_loss

525f93c

dakinggg reviewed Jun 14, 2024

View reviewed changes

snarayan21 added 2 commits June 14, 2024 15:37

cuda

ed0a219

Merge branch 'saaketh/fa_ce_loss' of github.com-me:mosaicml/composer …

102ff47

…into saaketh/fa_ce_loss merging origin

cuda checks

48cfac0

snarayan21 requested a review from a team as a code owner June 14, 2024 23:05

dakinggg approved these changes Jun 14, 2024

View reviewed changes

dakinggg reviewed Jun 14, 2024

View reviewed changes

snarayan21 added 4 commits June 14, 2024 16:29

test

fb1268e

fix_test

4c6b6ae

gloo

0084af5

gloo

f8ee4c3

mvpatel2000 requested changes Jun 17, 2024

View reviewed changes

Merge branch 'dev' into saaketh/fa_ce_loss

250d340

mvpatel2000 approved these changes Jun 17, 2024

View reviewed changes

snarayan21 and others added 4 commits June 17, 2024 10:43

lint

7f05db7

Merge branch 'saaketh/fa_ce_loss' of github.com-me:mosaicml/composer …

eb47ebd

…into saaketh/fa_ce_loss yo

lint

636b157

Merge branch 'dev' into saaketh/fa_ce_loss

2c8473c

snarayan21 merged commit 2cf9262 into dev Jun 17, 2024
17 checks passed

snarayan21 deleted the saaketh/fa_ce_loss branch June 17, 2024 19:09

snarayan21 added a commit that referenced this pull request Jun 18, 2024

Revert "Optionally use flash-attn's CE loss for metrics (#3394)"

e47130a

This reverts commit 2cf9262. revert dat boi

snarayan21 mentioned this pull request Jun 18, 2024

Revert "Optionally use flash-attn's CE loss for metrics (#3394)" #3408

Merged

7 tasks

snarayan21 added a commit that referenced this pull request Jun 18, 2024

Revert "Optionally use flash-attn's CE loss for metrics (#3394)" (#…

08da687

…3408) This reverts commit 2cf9262. revert dat boi

snarayan21 added a commit that referenced this pull request Jun 18, 2024

CPU tests image fix (#3409)

c425fa3

* Revert "Optionally use `flash-attn`'s CE loss for metrics (#3394)" This reverts commit 2cf9262. revert dat boi * remove * slamm

snarayan21 restored the saaketh/fa_ce_loss branch June 19, 2024 04:31

mvpatel2000 pushed a commit to mvpatel2000/composer that referenced this pull request Jul 21, 2024

Revert "Optionally use flash-attn's CE loss for metrics (mosaicml#3394

18562ba

)" (mosaicml#3408) This reverts commit 2cf9262. revert dat boi

mvpatel2000 pushed a commit to mvpatel2000/composer that referenced this pull request Jul 21, 2024

CPU tests image fix (mosaicml#3409)

0950c9d

* Revert "Optionally use `flash-attn`'s CE loss for metrics (mosaicml#3394)" This reverts commit 2cf9262. revert dat boi * remove * slamm

mvpatel2000 pushed a commit that referenced this pull request Jul 21, 2024

Revert "Optionally use flash-attn's CE loss for metrics (#3394)" (#…

0a1a6a4

…3408) This reverts commit 2cf9262. revert dat boi

mvpatel2000 pushed a commit that referenced this pull request Jul 21, 2024

CPU tests image fix (#3409)

0d6ef26

* Revert "Optionally use `flash-attn`'s CE loss for metrics (#3394)" This reverts commit 2cf9262. revert dat boi * remove * slamm

snarayan21 mentioned this pull request Jul 31, 2024

Optionally use flash-attn's CE loss for metrics #3507

Draft

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optionally use `flash-attn`'s CE loss for metrics #3394

Optionally use `flash-attn`'s CE loss for metrics #3394

snarayan21 commented Jun 11, 2024 •

edited

Loading

dakinggg left a comment

mvpatel2000 left a comment

ShashankMosaicML commented Jun 14, 2024

dakinggg left a comment

snarayan21 commented Jun 14, 2024

snarayan21 commented Jun 14, 2024

dakinggg left a comment

dakinggg left a comment

mvpatel2000 left a comment

mvpatel2000 left a comment

snarayan21 commented Jun 17, 2024

Optionally use flash-attn's CE loss for metrics #3394

Optionally use flash-attn's CE loss for metrics #3394

Conversation

snarayan21 commented Jun 11, 2024 • edited Loading

What does this PR do?

What issue(s) does this change relate to?

Before submitting

dakinggg left a comment

Choose a reason for hiding this comment

mvpatel2000 left a comment

Choose a reason for hiding this comment

ShashankMosaicML commented Jun 14, 2024

dakinggg left a comment

Choose a reason for hiding this comment

snarayan21 commented Jun 14, 2024

snarayan21 commented Jun 14, 2024

dakinggg left a comment

Choose a reason for hiding this comment

dakinggg left a comment

Choose a reason for hiding this comment

mvpatel2000 left a comment

Choose a reason for hiding this comment

mvpatel2000 left a comment

Choose a reason for hiding this comment

snarayan21 commented Jun 17, 2024

Optionally use `flash-attn`'s CE loss for metrics #3394

Optionally use `flash-attn`'s CE loss for metrics #3394

snarayan21 commented Jun 11, 2024 •

edited

Loading