Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

[enhancement] refactor bert finetuning script #692

Merged
merged 16 commits into from
May 7, 2019

Conversation

eric-haibin-lin
Copy link
Member

@eric-haibin-lin eric-haibin-lin commented May 3, 2019

Description

summary of changes:

Checklist

Essentials

  • PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@eric-haibin-lin eric-haibin-lin requested a review from szha as a code owner May 3, 2019 21:56
@eric-haibin-lin
Copy link
Member Author

@Gpwner here is the fix for the BERTTransform bug. Thanks for reporting

@codecov
Copy link

codecov bot commented May 4, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@b079bab). Click here to learn what that means.
The diff coverage is 100%.

@@           Coverage Diff            @@
##             master    #692   +/-   ##
========================================
  Coverage          ?   89.4%           
========================================
  Files             ?      66           
  Lines             ?    5918           
  Branches          ?       0           
========================================
  Hits              ?    5291           
  Misses            ?     627           
  Partials          ?       0
Flag Coverage Δ
#PR692 89.4% <100%> (?)
#notserial 64.65% <50%> (?)
#py2 89.4% <100%> (?)
#serial 68.06% <100%> (?)

@codecov
Copy link

codecov bot commented May 4, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@b079bab). Click here to learn what that means.
The diff coverage is 100%.

@@            Coverage Diff            @@
##             master     #692   +/-   ##
=========================================
  Coverage          ?   90.94%           
=========================================
  Files             ?       64           
  Lines             ?     5887           
  Branches          ?        0           
=========================================
  Hits              ?     5354           
  Misses            ?      533           
  Partials          ?        0

scripts/bert/finetune_classifier.py Show resolved Hide resolved
scripts/bert/finetune_classifier.py Outdated Show resolved Hide resolved
@eric-haibin-lin eric-haibin-lin mentioned this pull request May 5, 2019
Copy link

@tlby tlby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff, looks worthwhile! I've hacked in a couple of these features myself downstream (mp for the BERTDatasetTransform and GPU selection).

EC2 Default User added 2 commits May 6, 2019 23:13
scripts/tests/test_scripts.py Outdated Show resolved Hide resolved
scripts/tests/test_scripts.py Outdated Show resolved Hide resolved
Copy link
Contributor

@vanewu vanewu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@mli
Copy link
Member

mli commented May 7, 2019

Job PR-692/10 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-692/10/index.html

@szha szha merged commit c9e80b3 into dmlc:master May 7, 2019
astonzhang pushed a commit that referenced this pull request May 10, 2019
* refactor finetune script

* fix test with inference_only

* enhance data preprocessing

* fix label in bert transform

* fix lint

* fix lint

* Update dataset.py

* fix test

* fix test

* do not use bert-adam on mxnet 1.4

* use sys.executable

* fix tutorial

* parameter test

* fix typo

* commit a missing line
@pengxin99
Copy link
Contributor

I try the new script, and found the error:
@eric-haibin-lin

INFO:root:processing dataset...
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/pengxiny/anaconda3/envs/mxnet-cpu/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/pengxiny/anaconda3/envs/mxnet-cpu/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/home/pengxiny/autotest_gluonnlp/gluon-nlp/scripts/bert/dataset.py", line 432, in __call__
    return self._bert_xform(line)
  File "/home/pengxiny/anaconda3/envs/mxnet-cpu/lib/python3.6/site-packages/gluonnlp-0.6.0-py3.6.egg/gluonnlp/data/transforms.py", line 1089, in __call__
    assert len(line) == 2
AssertionError
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "finetune_classifier.py", line 328, in <module>
    bert_tokenizer, task, batch_size, dev_batch_size, args.max_len, args.pad)
  File "finetune_classifier.py", line 314, in preprocess_data
    data_test = mx.gluon.data.SimpleDataset(pool.map(test_trans, data))
  File "/home/pengxiny/anaconda3/envs/mxnet-cpu/lib/python3.6/multiprocessing/pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/pengxiny/anaconda3/envs/mxnet-cpu/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
AssertionError

@szha
Copy link
Member

szha commented May 15, 2019

@pengxin99 you may need to install GluonNLP from master branch first. http://gluon-nlp.mxnet.io/install.html#install-from-github

@eric-haibin-lin
Copy link
Member Author

@pengxin99 would you mind sending me the complete command to reproduce it?

@pengxin99
Copy link
Contributor

@szha Thanks and i think i run the script at the latest gluonnlp(use python setup.py install)

@eric-haibin-lin I list the env and run command below:
OS: CentOS Linux release 7.4.1708 (Core)
mxnet: pip install mxnet-mkl --pre
gluonnlp: python setup install (latest version)

finetune: GLUE_DIR=glue_data python finetune_classifier.py --task_name MRPC --batch_size 32 --optimizer bertadam --epochs 3 --lr 2e-5

@pengxin99
Copy link
Contributor

I use the #708 to have a try, and it works well :)
thanks, @eric-haibin-lin @szha

paperplanet pushed a commit to paperplanet/gluon-nlp that referenced this pull request Jun 9, 2019
* refactor finetune script

* fix test with inference_only

* enhance data preprocessing

* fix label in bert transform

* fix lint

* fix lint

* Update dataset.py

* fix test

* fix test

* do not use bert-adam on mxnet 1.4

* use sys.executable

* fix tutorial

* parameter test

* fix typo

* commit a missing line
@eric-haibin-lin eric-haibin-lin deleted the finetune branch February 2, 2020 06:20
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants