Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add linear layer and ffn config to enable TransformerEngine layers (with FP8) #432

Merged
merged 42 commits into from
Jul 17, 2023

Conversation

vchiley
Copy link
Contributor

@vchiley vchiley commented Jul 6, 2023

This PR adds a config for Linear Layers and FFN modules which allows the use TransformerEngine's te.Linear and te.LayerNormMLP modules (which have fp8 with amp.fp8).
+ I did a little cleanup

This PR is built on top of #271

In the future, this'll also allow us to add and prototype other linear layers and ffn blocks. Furthermore it enables us to configure TP/SP for the MLP block, in the build_ffn util fn.

AMP FP8 training gets results which are nearly identical to AMP BF16:
image
but has faster runtime.

Furthermore ffn_config_defaults: ffn_type: te_ln_mlp allows us to use transformer engine's LayerNormMLP layer which has SP and TP support if configured correctly.

@vchiley vchiley self-assigned this Jul 6, 2023
@vchiley vchiley force-pushed the cfgte branch 3 times, most recently from 92dbf7e to 23ff7b9 Compare July 6, 2023 15:08
@vchiley vchiley marked this pull request as ready for review July 6, 2023 20:19
@vancoykendall
Copy link
Contributor

I'm excited that this PR is being worked on, mainly because I've been extending the MPTMLP myself in order to try using gated linear units (GEGLU, SwiGLU, etc) and also decided to add a ffn_config in my implementation. I was wondering though, why expansion_ratio is not a part of the ffn_config? From my understanding, expansion_ratio is only used when building the ffn block, so it makes more sense to me for it to be defined in the ffn_config. Also, what does fc stand for in fc_type? It isn't clear to me.

@vchiley
Copy link
Contributor Author

vchiley commented Jul 6, 2023

@vancoykendall

fc stands for fully connected layer

While expansion_ratio should really be part of the ffn_config, moving it into the ffn_config breaks backwards compatibility with models already uploaded to HuggingFace hub and all of our internal checkpoints so we are leaving that config variable alone.

scripts/train/train.py Outdated Show resolved Hide resolved
@sashaDoubov
Copy link
Contributor

LGTM

@vchiley vchiley merged commit 340a566 into mosaicml:main Jul 17, 2023
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants