-
Notifications
You must be signed in to change notification settings - Fork 538
FEATURE Bert fp16 norm: multi-tensor sum_sq #1115
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1115 +/- ##
==========================================
- Coverage 88.25% 88.21% -0.05%
==========================================
Files 67 66 -1
Lines 6275 6312 +37
==========================================
+ Hits 5538 5568 +30
- Misses 737 744 +7
|
Job PR-1115/1 is complete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Could you also update https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/utils.py#L118 clip_global_norm in mxnet with the multi square sum op? Thanks
@MoisesHer please fix lint |
Job PR-1115/2 is complete. |
Job PR-1115/3 is complete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more comment. otherwise looks good to me
Job PR-1115/4 is complete. |
Description
Make use of multi-tensor sum of squares during the computation of the norm in bert training.
Checklist
Essentials
Changes
If multiple contexts/devices are available, the tensors are distributed among them, and later they are reduced into a single scalar.
Comments