Torch2 (#177) #178

vchiley · 2023-05-19T19:40:17Z

move #149 to main repo (from a fork)

uses #147 as a springboard to updt torch

In interactive instance, I installed torch2 req and everything works fine

125M models was getting good (the same) MFU from the same exact config in both torch1.13 and torch2

Note: torch2 version pip list has both triton version:

torch                  2.0.1+cu118

triton                 2.0.0
triton-pre-mlir        2.0.0

doesn't seem to matter

Note: this does not use torch.compile() (but there is no reason it shouldn't)

Note: flash-attn is still installed. xentropy-cuda-lib is also still installed; I'm not setting loss_fn so mpt defaults to using fused_crossentropy for both settings.

Biggest low probability risk: this old version of triton does not compile / work for H100s... 👀
Risk: triton_pre_mlir has no support and will never be updated.

~~Still need to test at scale / convergence~~
see torch2 vs torch1.13 produce the same results here

cc @sashaDoubov (enables torch2 for muP dev)
cc @dskhudia enables torch2 and torch.compile() with triton attn impl

old pr commits:

make triton attn req mlri tagged triton
add comment
updt err
clean up req / install
exclude HazyR flash attn from pyright
lint
exclude flash_attn_triton.py from pyright
updt torch version & install instructions
add extra install instructions for installing CMake
lint
adding torch1.13 and torch2 testing matrix

* make triton attn req mlri tagged triton * add comment * updt err * clean up req / install * updt * updt * exclude HazyR flash attn from pyright * lint * exclude flash_attn_triton.py from pyright * updt torch version * updt install instructions * updt * add extra install instructions for installing CMake * lint * updt * updt torch * updt * adding torch1.13 and torch2 testing matrix

tests/test_model.py

.github/workflows/pr-gpu.yaml

mvpatel2000 · 2023-05-19T21:52:36Z

LGTM

This reverts commit bb7f8bb.

This reverts commit 89f56d2.

* fix and test * Revert "Revert "Torch2 (#177) (#178)" (#181)" This reverts commit 89f56d2. * updt import try except * updt hf model * updt imports * lint * add mpt hf model init / gen test * updt for temp testing * lint * rerun tests * Update .github/workflows/release.yaml Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com> * Update tests/test_hf_mpt_gen.py Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com> * add cpu test * updt tests / cpu img * updt cpu test install * rerun tests * fix hf import structure * fix test * pull_request -> pull_request_target * make onnx test smaller --------- Co-authored-by: Daniel King <daniel@mosaicml.com> Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>

* Torch2 (#177) * make triton attn req mlri tagged triton * add comment * updt err * clean up req / install * updt * updt * exclude HazyR flash attn from pyright * lint * exclude flash_attn_triton.py from pyright * updt torch version * updt install instructions * updt * add extra install instructions for installing CMake * lint * updt * updt torch * updt * adding torch1.13 and torch2 testing matrix * Update pr-gpu.yaml * Update test_model.py * Update pr-cpu.yaml * Update pr-gpu.yaml * Update test_dataloader.py * Update pr-gpu.yaml

This reverts commit 0e75f3b.

* fix and test * Revert "Revert "Torch2 (#177) (#178)" (#181)" This reverts commit 601d61a. * updt import try except * updt hf model * updt imports * lint * add mpt hf model init / gen test * updt for temp testing * lint * rerun tests * Update .github/workflows/release.yaml Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com> * Update tests/test_hf_mpt_gen.py Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com> * add cpu test * updt tests / cpu img * updt cpu test install * rerun tests * fix hf import structure * fix test * pull_request -> pull_request_target * make onnx test smaller --------- Co-authored-by: Daniel King <daniel@mosaicml.com> Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>

vchiley added 3 commits May 19, 2023 12:38

Update pr-gpu.yaml

f7b29a5

Update test_model.py

b9cbfb8

vchiley requested review from mvpatel2000, sashaDoubov and dakinggg May 19, 2023 20:22

vchiley added 3 commits May 19, 2023 13:44

Update pr-cpu.yaml

2d61e35

Update pr-gpu.yaml

dd618fc

Update test_dataloader.py

e2d6d50

sashaDoubov reviewed May 19, 2023

View reviewed changes

tests/test_model.py Show resolved Hide resolved

.github/workflows/pr-gpu.yaml Outdated Show resolved Hide resolved

Update pr-gpu.yaml

b48ea18

mvpatel2000 approved these changes May 19, 2023

View reviewed changes

mvpatel2000 merged commit bb7f8bb into main May 19, 2023
7 checks passed

mvpatel2000 deleted the vitaliy/torch2 branch May 19, 2023 22:21

vchiley mentioned this pull request May 19, 2023

Enable Torch2 #149

Closed

dakinggg added a commit to dakinggg/llm-foundry that referenced this pull request May 20, 2023

Revert "Torch2 (mosaicml#177) (mosaicml#178)"

7f5818b

This reverts commit bb7f8bb.

dakinggg added a commit that referenced this pull request May 20, 2023

Revert "Torch2 (#177) (#178)" (#181)

89f56d2

This reverts commit bb7f8bb.

vchiley mentioned this pull request May 22, 2023

Flash Attention vs Triton Flash Attention #180

Closed

vchiley added a commit to vchiley/llm-foundry that referenced this pull request May 22, 2023

Revert "Revert "Torch2 (mosaicml#177) (mosaicml#178)" (mosaicml#181)"

5e09eed

This reverts commit 89f56d2.

vchiley restored the vitaliy/torch2 branch May 23, 2023 16:49

bmosaicml pushed a commit that referenced this pull request Jun 8, 2023

Revert "Torch2 (#177) (#178)" (#181)

601d61a

This reverts commit 0e75f3b.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Torch2 (#177) #178

Torch2 (#177) #178

vchiley commented May 19, 2023 •

edited

Loading

mvpatel2000 commented May 19, 2023

Torch2 (#177) #178

Torch2 (#177) #178

Conversation

vchiley commented May 19, 2023 • edited Loading

mvpatel2000 commented May 19, 2023

vchiley commented May 19, 2023 •

edited

Loading