[enhancement] refactor bert finetuning script #692

eric-haibin-lin · 2019-05-03T21:56:56Z

Description

summary of changes:

introduce GlueTask class which stores the metadata of the task (class labels, is_pair, etc)
reuse the Dataset API for glue datasets introduced in [Dataset] Fix CoLA dataset index and add Glue dataset API #682
fix bert transform for the case when label is not available
add script test

Checklist

Essentials

PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
Code is well-documented

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

eric-haibin-lin · 2019-05-03T22:19:21Z

@Gpwner here is the fix for the BERTTransform bug. Thanks for reporting

codecov · 2019-05-04T04:19:00Z

Codecov Report

❗ No coverage uploaded for pull request base (master@b079bab). Click here to learn what that means.
The diff coverage is 100%.

@@           Coverage Diff            @@
##             master    #692   +/-   ##
========================================
  Coverage          ?   89.4%           
========================================
  Files             ?      66           
  Lines             ?    5918           
  Branches          ?       0           
========================================
  Hits              ?    5291           
  Misses            ?     627           
  Partials          ?       0

Flag	Coverage Δ
#PR692	`89.4% <100%> (?)`
#notserial	`64.65% <50%> (?)`
#py2	`89.4% <100%> (?)`
#serial	`68.06% <100%> (?)`

codecov · 2019-05-04T04:19:01Z

Codecov Report

❗ No coverage uploaded for pull request base (master@b079bab). Click here to learn what that means.
The diff coverage is 100%.

@@            Coverage Diff            @@
##             master     #692   +/-   ##
=========================================
  Coverage          ?   90.94%           
=========================================
  Files             ?       64           
  Lines             ?     5887           
  Branches          ?        0           
=========================================
  Hits              ?     5354           
  Misses            ?      533           
  Partials          ?        0

scripts/bert/finetune_classifier.py

tlby

Good stuff, looks worthwhile! I've hacked in a couple of these features myself downstream (mp for the BERTDatasetTransform and GPU selection).

scripts/tests/test_scripts.py

vanewu

LGTM 👍

mli · 2019-05-07T17:45:27Z

Job PR-692/10 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-692/10/index.html

scripts/bert/dataset.py

* refactor finetune script * fix test with inference_only * enhance data preprocessing * fix label in bert transform * fix lint * fix lint * Update dataset.py * fix test * fix test * do not use bert-adam on mxnet 1.4 * use sys.executable * fix tutorial * parameter test * fix typo * commit a missing line

pengxin99 · 2019-05-15T08:47:41Z

I try the new script, and found the error:
@eric-haibin-lin

INFO:root:processing dataset...
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/pengxiny/anaconda3/envs/mxnet-cpu/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/pengxiny/anaconda3/envs/mxnet-cpu/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/home/pengxiny/autotest_gluonnlp/gluon-nlp/scripts/bert/dataset.py", line 432, in __call__
    return self._bert_xform(line)
  File "/home/pengxiny/anaconda3/envs/mxnet-cpu/lib/python3.6/site-packages/gluonnlp-0.6.0-py3.6.egg/gluonnlp/data/transforms.py", line 1089, in __call__
    assert len(line) == 2
AssertionError
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "finetune_classifier.py", line 328, in <module>
    bert_tokenizer, task, batch_size, dev_batch_size, args.max_len, args.pad)
  File "finetune_classifier.py", line 314, in preprocess_data
    data_test = mx.gluon.data.SimpleDataset(pool.map(test_trans, data))
  File "/home/pengxiny/anaconda3/envs/mxnet-cpu/lib/python3.6/multiprocessing/pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/pengxiny/anaconda3/envs/mxnet-cpu/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
AssertionError

szha · 2019-05-15T15:54:44Z

@pengxin99 you may need to install GluonNLP from master branch first. http://gluon-nlp.mxnet.io/install.html#install-from-github

eric-haibin-lin · 2019-05-15T16:18:24Z

@pengxin99 would you mind sending me the complete command to reproduce it?

pengxin99 · 2019-05-16T02:02:26Z

@szha Thanks and i think i run the script at the latest gluonnlp(use python setup.py install)

@eric-haibin-lin I list the env and run command below:
OS: CentOS Linux release 7.4.1708 (Core)
mxnet: pip install mxnet-mkl --pre
gluonnlp: python setup install (latest version)

finetune: GLUE_DIR=glue_data python finetune_classifier.py --task_name MRPC --batch_size 32 --optimizer bertadam --epochs 3 --lr 2e-5

pengxin99 · 2019-05-16T10:17:54Z

I use the #708 to have a try, and it works well :)
thanks, @eric-haibin-lin @szha

* refactor finetune script * fix test with inference_only * enhance data preprocessing * fix label in bert transform * fix lint * fix lint * Update dataset.py * fix test * fix test * do not use bert-adam on mxnet 1.4 * use sys.executable * fix tutorial * parameter test * fix typo * commit a missing line

EC2 Default User added 4 commits May 3, 2019 06:32

refactor finetune script

f22ad68

fix test with inference_only

6b38867

enhance data preprocessing

a3b9294

fix label in bert transform

b5763f9

eric-haibin-lin requested a review from szha as a code owner May 3, 2019 21:56

EC2 Default User added 3 commits May 3, 2019 21:58

fix lint

faa6c87

fix lint

8ddf585

fix conflicts

b3df2ff

eric-haibin-lin requested review from haven-jeon and vanewu May 3, 2019 22:15

Update dataset.py

4732c1d

vanewu reviewed May 4, 2019

View reviewed changes

scripts/bert/finetune_classifier.py Show resolved Hide resolved

scripts/bert/finetune_classifier.py Outdated Show resolved Hide resolved

eric-haibin-lin mentioned this pull request May 5, 2019

[MXNet] - [BERT] #690

Closed

tlby approved these changes May 5, 2019

View reviewed changes

EC2 Default User added 2 commits May 6, 2019 23:13

fix test

ce3cbf3

fix test

de4903c

szha reviewed May 7, 2019

View reviewed changes

scripts/tests/test_scripts.py Outdated Show resolved Hide resolved

do not use bert-adam on mxnet 1.4

7f14852

szha reviewed May 7, 2019

View reviewed changes

scripts/tests/test_scripts.py Outdated Show resolved Hide resolved

EC2 Default User added 4 commits May 7, 2019 01:06

use sys.executable

66cc6b2

fix tutorial

cac1ea7

parameter test

9589965

fix typo

4ac2d47

vanewu approved these changes May 7, 2019

View reviewed changes

commit a missing line

aeb2178

szha reviewed May 7, 2019

View reviewed changes

scripts/bert/dataset.py Show resolved Hide resolved

szha approved these changes May 7, 2019

View reviewed changes

szha merged commit c9e80b3 into dmlc:master May 7, 2019

eric-haibin-lin deleted the finetune branch February 2, 2020 06:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[enhancement] refactor bert finetuning script #692

[enhancement] refactor bert finetuning script #692

eric-haibin-lin commented May 3, 2019 •

edited

Loading

eric-haibin-lin commented May 3, 2019

codecov bot commented May 4, 2019

codecov bot commented May 4, 2019 •

edited

Loading

tlby left a comment

vanewu left a comment

mli commented May 7, 2019

pengxin99 commented May 15, 2019

szha commented May 15, 2019

eric-haibin-lin commented May 15, 2019

pengxin99 commented May 16, 2019

pengxin99 commented May 16, 2019

[enhancement] refactor bert finetuning script #692

[enhancement] refactor bert finetuning script #692

Conversation

eric-haibin-lin commented May 3, 2019 • edited Loading

Description

Checklist

Essentials

Changes

Comments

eric-haibin-lin commented May 3, 2019

codecov bot commented May 4, 2019

Codecov Report

codecov bot commented May 4, 2019 • edited Loading

Codecov Report

tlby left a comment

Choose a reason for hiding this comment

vanewu left a comment

Choose a reason for hiding this comment

mli commented May 7, 2019

pengxin99 commented May 15, 2019

szha commented May 15, 2019

eric-haibin-lin commented May 15, 2019

pengxin99 commented May 16, 2019

pengxin99 commented May 16, 2019

eric-haibin-lin commented May 3, 2019 •

edited

Loading

codecov bot commented May 4, 2019 •

edited

Loading