Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a fix for Cross Entropy Loss for long sequence lengths. #795

Merged
merged 23 commits into from
Dec 12, 2023

Conversation

ShashankMosaicML
Copy link
Contributor

@ShashankMosaicML ShashankMosaicML commented Dec 9, 2023

For longer sequence lengths (>16k), we see that the current cross entropy loss implementation throws an illegal memory access error. This PR fixes this by reverting to an older version of cross entropy loss that works for longer sequence lengths as well.

WandB link for the experiments: https://wandb.ai/mosaic-ml/longcont-CE-debug

We see the same convergence, mfu, and memory usage before and after the fix
Screenshot 2023-12-11 at 12 26 28 PM

Screenshot 2023-12-11 at 12 26 13 PM Screenshot 2023-12-11 at 12 27 01 PM

@ShashankMosaicML ShashankMosaicML marked this pull request as ready for review December 10, 2023 00:54
@ShashankMosaicML ShashankMosaicML merged commit 96cf646 into mosaicml:main Dec 12, 2023
8 checks passed
@ShashankMosaicML ShashankMosaicML deleted the shashank/fix_FA_CE branch December 12, 2023 19:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants