FEATURE Bert fp16 norm: multi-tensor sum_sq #1115

MoisesHer · 2020-01-15T21:58:10Z

Description

Make use of multi-tensor sum of squares during the computation of the norm in bert training.

Checklist

Essentials

PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
Code is well-documented

Changes

Instead of computing the dot product of each tensor sequentially, for later compute the sum of squares, in this PR we use the Mxnet operator multi_sum_sq, which computes the dot product of several tensors in parallel.
If multiple contexts/devices are available, the tensors are distributed among them, and later they are reduced into a single scalar.
Added test in test/test_utils.py (test_grad_global_norm)
Modified clip_grad_global_norm to use grad_global_norm

Comments

codecov · 2020-01-15T21:58:12Z

Codecov Report

Merging #1115 into master will decrease coverage by 0.04%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #1115      +/-   ##
==========================================
- Coverage   88.25%   88.21%   -0.05%     
==========================================
  Files          67       66       -1     
  Lines        6275     6312      +37     
==========================================
+ Hits         5538     5568      +30     
- Misses        737      744       +7

Impacted Files	Coverage Δ
src/gluonnlp/utils/parameter.py	`87.09% <100%> (+3.16%)`	⬆️
src/gluonnlp/utils/files.py	`42.62% <0%> (-6.4%)`	⬇️
src/gluonnlp/optimizer/bert_adam.py	`87.32% <0%> (-5.86%)`	⬇️
src/gluonnlp/model/train/language_model.py	`88.51% <0%> (-5.27%)`	⬇️
src/gluonnlp/optimizer/__init__.py	`100% <0%> (ø)`	⬆️
src/gluonnlp/utils/version.py	`100% <0%> (ø)`	⬆️
src/gluonnlp/optimizer/lamb.py
src/gluonnlp/base.py	`89.65% <0%> (+3.44%)`	⬆️
src/gluonnlp/data/utils.py	`86.39% <0%> (+12.34%)`	⬆️

mli · 2020-01-15T22:36:53Z

Job PR-1115/1 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1115/1/index.html

eric-haibin-lin

Thanks. Could you also update https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/utils.py#L118 clip_global_norm in mxnet with the multi square sum op? Thanks

scripts/bert/fp16_utils.py

leezu · 2020-01-16T12:55:01Z

@MoisesHer please fix lint

src/gluonnlp/utils/parameter.py

mli · 2020-01-16T21:35:00Z

Job PR-1115/2 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1115/2/index.html

mli · 2020-01-16T23:27:34Z

Job PR-1115/3 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1115/3/index.html

eric-haibin-lin

One more comment. otherwise looks good to me

src/gluonnlp/utils/parameter.py

mli · 2020-01-20T22:42:57Z

Job PR-1115/4 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1115/4/index.html

Using multi_sum_sq Op to compute norm

4dd9bb0

MoisesHer requested a review from a team as a code owner January 15, 2020 21:58

eric-haibin-lin reviewed Jan 15, 2020

View reviewed changes

scripts/bert/fp16_utils.py Outdated Show resolved Hide resolved

eric-haibin-lin added the release focus Progress focus for release label Jan 15, 2020

move grad_global_norm into gluonnlp.utils and add test

b0d035e

eric-haibin-lin reviewed Jan 16, 2020

View reviewed changes

src/gluonnlp/utils/parameter.py Show resolved Hide resolved

Early return if not max_norm, and fix lint

a196144

eric-haibin-lin reviewed Jan 18, 2020

View reviewed changes

src/gluonnlp/utils/parameter.py Outdated Show resolved Hide resolved

avoid evaluation of condition: max_norm argument

ca4e8e5

eric-haibin-lin approved these changes Jan 21, 2020

View reviewed changes

eric-haibin-lin merged commit f19ace9 into dmlc:master Jan 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEATURE Bert fp16 norm: multi-tensor sum_sq #1115

FEATURE Bert fp16 norm: multi-tensor sum_sq #1115

MoisesHer commented Jan 15, 2020 •

edited

Loading

codecov bot commented Jan 15, 2020 •

edited

Loading

mli commented Jan 15, 2020

eric-haibin-lin left a comment

leezu commented Jan 16, 2020

mli commented Jan 16, 2020

mli commented Jan 16, 2020

eric-haibin-lin left a comment

mli commented Jan 20, 2020

FEATURE Bert fp16 norm: multi-tensor sum_sq #1115

FEATURE Bert fp16 norm: multi-tensor sum_sq #1115

Conversation

MoisesHer commented Jan 15, 2020 • edited Loading

Description

Checklist

Essentials

Changes

Comments

codecov bot commented Jan 15, 2020 • edited Loading

Codecov Report

mli commented Jan 15, 2020

eric-haibin-lin left a comment

Choose a reason for hiding this comment

leezu commented Jan 16, 2020

mli commented Jan 16, 2020

mli commented Jan 16, 2020

eric-haibin-lin left a comment

Choose a reason for hiding this comment

mli commented Jan 20, 2020

MoisesHer commented Jan 15, 2020 •

edited

Loading

codecov bot commented Jan 15, 2020 •

edited

Loading