Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Here is the issue:
For FP16 training, we use Nvidia/apex.
They changed the API to something called AMP and we adapted to this new API on June 13.
However, a few things are broken in this new API when it comes to handle FusedAdam.
On Aug 8 they included FusedAdam in the AMP new API but I realized that:
O2 level does not work at all for both Adam and FuseAdam (O2 was our Default level).
see NVIDIA/apex#475
O1 level works fine for Adam but does work (unstable) with FusedAdam.
This PR will:
This solution should be temporary since Nvidia works on including AMP directly in Pytorch.
As of this PR, Accuracy/PPL are ok for Adam FP32, Adam FP16, FusedAdam FP16