[FEATURE] Add LAMB optimizer #733

vanewu · 2019-05-27T11:23:45Z

Description

The LAMB optimizer:
It has been proposed in Reducing BERT Pre-Training Time from 3 Days to 76 Minutes.

A simple neural network has been used for verification, and the results show that the model can converge normally with large batchsize. Need to be verified further on the BERT.

@eric-haibin-lin Verification may require your help because of some computing resource limitations.

Checklist

Essentials

PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
Code is well-documented

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

codecov · 2019-05-27T11:23:46Z

Codecov Report

❗ No coverage uploaded for pull request head (LAMB@104486d). Click here to learn what that means.
The diff coverage is n/a.

codecov · 2019-05-27T11:23:47Z

Codecov Report

Merging #733 into master will increase coverage by 0.02%.
The diff coverage is 90.9%.

@@            Coverage Diff             @@
##           master     #733      +/-   ##
==========================================
+ Coverage   90.56%   90.58%   +0.02%     
==========================================
  Files          65       66       +1     
  Lines        6071     6118      +47     
==========================================
+ Hits         5498     5542      +44     
- Misses        573      576       +3

Impacted Files	Coverage Δ
src/gluonnlp/optimizer/__init__.py	`100% <100%> (ø)`	⬆️
src/gluonnlp/optimizer/lamb.py	`90.47% <90.47%> (ø)`
src/gluonnlp/data/corpora/google_billion_word.py	`66.66% <0%> (-8.34%)`	⬇️
src/gluonnlp/data/utils.py	`76.25% <0%> (-2.1%)`	⬇️
src/gluonnlp/model/__init__.py	`96% <0%> (-0.16%)`	⬇️
src/gluonnlp/vocab/vocab.py	`97.94% <0%> (ø)`	⬆️
src/gluonnlp/model/utils.py	`76.72% <0%> (ø)`	⬆️
src/gluonnlp/data/dataloader.py	`88.79% <0%> (+5.17%)`	⬆️
...p/data/corpora/large_text_compression_benchmark.py	`89.28% <0%> (+8.92%)`	⬆️

mli · 2019-05-27T12:27:19Z

Job PR-733/1 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-733/1/index.html

mli · 2019-05-27T12:37:04Z

Job PR-733/2 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-733/2/index.html

mli · 2019-05-27T17:52:39Z

Job PR-733/3 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-733/3/index.html

mli · 2019-05-27T19:40:51Z

Job PR-733/4 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-733/4/index.html

szhengac · 2019-05-28T03:46:04Z

src/gluonnlp/optimizer/lamb.py

+        var_hat = var / (1. - power(self.beta2, t))
+
+        r1 = weight.norm()
+        g = mean_hat / sqrt(var_hat + self.epsilon) + wd * weight


put epsilon outside the sqrt

I compared the calculation formula in the original paper and found that epsilon is in sqrt.

they change the algorithm in their newest arxiv version.

thanks for your reminder. I found that the paper I referenced was not synchronized with the latest one. I will update the relevant code later.

szhengac · 2019-05-28T03:47:29Z

src/gluonnlp/optimizer/lamb.py

+
+        # calculate lamb_trust_ratio
+        r = 1. if r1 == 0. or r2 == 0. else minimum(
+            maximum(r1 / r2, self.lower_bound), self.upper_bound)


the clip func only performs on r1 and g is normalized by r2.

szhengac · 2019-05-28T03:51:28Z

src/gluonnlp/optimizer/lamb.py

+
+        # execution bias correction
+        mean_hat = mean / (1. - power(self.beta1, t))
+        var_hat = var / (1. - power(self.beta2, t))


it seems that bias correction is not performed in the algorithm. needs to double check.

What about we keep both versions of LAMB, and test which one is better by ourselves? There are significant differences between these 2, especially the existence of bias correction. Just add a flag so that the users can choose which one to use.

Two versions have been controlled using parameters,use_latest. I have tested two versions on a small model and there is no obvious difference. I think it should be tested on larger models and data.

mli · 2019-05-29T10:00:08Z

Job PR-733/5 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-733/5/index.html

xcgoner

minor issue on the comments

src/gluonnlp/optimizer/lamb.py

mli · 2019-05-30T08:26:15Z

Job PR-733/6 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-733/6/index.html

xcgoner

LGTM

src/gluonnlp/optimizer/lamb.py

mli · 2019-06-03T07:58:34Z

Job PR-733/10 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-733/10/index.html

mli · 2019-06-03T18:46:07Z

Job PR-733/11 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-733/11/index.html

* add lamb optimizer to gluonnlp * add lamb optimizer to gluonnlp * Add a simple test for LAMB to verify if it will converge * add the latest version of the calculation for LAMB * update doc of lamb * add optimizer to the docs * rename and remove arguments * Correction of typos * fix lint * fix doc lint * update doc

vanewu added 2 commits May 27, 2019 18:28

add lamb optimizer to gluonnlp

22cabdd

add lamb optimizer to gluonnlp

19c6265

vanewu requested a review from szha as a code owner May 27, 2019 11:23

vanewu requested a review from eric-haibin-lin May 27, 2019 11:26

Add a simple test for LAMB to verify if it will converge

d5f03e4

vanewu force-pushed the LAMB branch from 104486d to d5f03e4 Compare May 27, 2019 11:35

szha requested a review from szhengac May 27, 2019 19:07

szhengac suggested changes May 28, 2019

View reviewed changes

szhengac reviewed May 28, 2019

View reviewed changes

add the latest version of the calculation for LAMB

d228fdd

vanewu force-pushed the LAMB branch from 2e4f2b3 to d228fdd Compare May 29, 2019 08:54

xcgoner suggested changes May 29, 2019

View reviewed changes

src/gluonnlp/optimizer/lamb.py Outdated Show resolved Hide resolved

src/gluonnlp/optimizer/lamb.py Outdated Show resolved Hide resolved

update doc of lamb

d1a5503

xcgoner approved these changes May 30, 2019

View reviewed changes

eric-haibin-lin reviewed May 30, 2019

View reviewed changes

src/gluonnlp/optimizer/lamb.py Outdated Show resolved Hide resolved

src/gluonnlp/optimizer/lamb.py Show resolved Hide resolved

src/gluonnlp/optimizer/lamb.py Outdated Show resolved Hide resolved

szha added the release focus Progress focus for release label May 31, 2019

vanewu added 5 commits June 3, 2019 10:59

add optimizer to the docs

5db2797

rename and remove arguments

74634aa

Correction of typos

d709156

fix lint

5222733

fix doc lint

d4bdc54

update doc

5a5578f

eric-haibin-lin approved these changes Jun 3, 2019

View reviewed changes

szhengac approved these changes Jun 4, 2019

View reviewed changes

szhengac merged commit 65dbde9 into dmlc:master Jun 4, 2019

vanewu deleted the LAMB branch June 12, 2019 09:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Add LAMB optimizer #733

[FEATURE] Add LAMB optimizer #733

vanewu commented May 27, 2019 •

edited

Loading

codecov bot commented May 27, 2019

codecov bot commented May 27, 2019 •

edited

Loading

mli commented May 27, 2019

mli commented May 27, 2019

mli commented May 27, 2019

mli commented May 27, 2019

szhengac May 28, 2019

vanewu May 28, 2019

szhengac May 28, 2019

vanewu May 28, 2019

szhengac May 28, 2019

szhengac May 28, 2019

xcgoner May 28, 2019

vanewu May 29, 2019

mli commented May 29, 2019

xcgoner left a comment

mli commented May 30, 2019

xcgoner left a comment

mli commented Jun 3, 2019

mli commented Jun 3, 2019

[FEATURE] Add LAMB optimizer #733

[FEATURE] Add LAMB optimizer #733

Conversation

vanewu commented May 27, 2019 • edited Loading

Description

Checklist

Essentials

Changes

Comments

codecov bot commented May 27, 2019

Codecov Report

codecov bot commented May 27, 2019 • edited Loading

Codecov Report

mli commented May 27, 2019

mli commented May 27, 2019

mli commented May 27, 2019

mli commented May 27, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mli commented May 29, 2019

xcgoner left a comment

Choose a reason for hiding this comment

mli commented May 30, 2019

xcgoner left a comment

Choose a reason for hiding this comment

mli commented Jun 3, 2019

mli commented Jun 3, 2019

vanewu commented May 27, 2019 •

edited

Loading

codecov bot commented May 27, 2019 •

edited

Loading