-
Notifications
You must be signed in to change notification settings - Fork 417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update fsdp mixed precision #2047
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, would like abhi to approve as well
Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>
Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>
fix to test failing in #2050 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@vchiley can we get a side-by-side LLM run before and after this change? Aka can we do just a small 125M run with previous |
testing behavior change here gpt125m-flash-mpf is using mp: gpt125m-flash-mpp16b16 is using mp: At on 8 GPUs at MosaicGPT 125M to chinchilla point shows no diff between any of the runs. The speed is virtually the same for all of the configs, except 'FULL' is ~3% slower. I tried running the 7B model with the same set of 4 diff mp configs on 32 GPUs to see performance: It seems like PURE and the new DEFAULT run at the same speed and are faster than the old DEFAULT and PURE is slowest |
* update fsdp mixed precision * Update docs/source/notes/distributed_training.rst Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com> * Update docs/source/notes/distributed_training.rst * Apply suggestions from code review Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com> --------- Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com> Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>
What does this PR do?
Our MixedPrecision setting 'DEFAULT' should be updated to:
This will make 'DEFAULT' emulate AMP behavior.
use FULL if you need
use PURE if you want to live dangerously
standard AMP should be the 'DEFAULT' MixedPrecision setting (see above)
The current 'DEFAULT' shouldn't really be used.
What issue(s) does this change relate to?
Fixes https://mosaicml.atlassian.net/browse/CO-1896
Before submitting
pre-commit
on your change? (see thepre-commit
section of prerequisites)