[MODEL] BERT based NER #612

bikestra · 2019-02-23T06:42:15Z

Description

Early implementation of BERT based NER built on top of @kenjewu 's work. Not ready to be merged in, but per discussion in #593 I wanted to share the current state so that @fierceX , @parry2403 and others can chime in.

Note that current code depends on seqeval package, which I am not sure we should take dependency on.

Checklist

Essentials

PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
Code is well-documented

Changes

Implement BERT model and basic training loop
[?] Reproduce BERT paper's results
Clean up code for training loop, add more logs
Remove dependency on seqeval

Comments

~~The best Test F1 score I got on CoNLL 2003 English is only 90.02, with parameters I put as default.~~

mli · 2019-02-23T07:49:08Z

Job PR-612/1 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-612/1/index.html

codecov · 2019-02-25T03:52:22Z

Codecov Report

Merging #612 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #612   +/-   ##
=======================================
  Coverage   67.07%   67.07%           
=======================================
  Files         140      140           
  Lines       12423    12423           
=======================================
  Hits         8333     8333           
  Misses       4090     4090

Flag	Coverage Δ
#PR587	`67.33% <0%> (ø)`	⬆️
#PR612	`66.74% <0%> (ø)`	⬆️
#master	`67.13% <0%> (ø)`	⬆️
#notserial	`43.63% <0%> (ø)`	⬆️
#py2	`67.44% <0%> (ø)`	⬆️
#py3	`66.91% <0%> (ø)`	⬆️
#serial	`52.41% <0%> (ø)`	⬆️

codecov · 2019-02-25T03:52:27Z

Codecov Report

Merging #612 into master will increase coverage by 0.03%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master     #612      +/-   ##
==========================================
+ Coverage    90.5%   90.53%   +0.03%     
==========================================
  Files          65       65              
  Lines        6076     6076              
==========================================
+ Hits         5499     5501       +2     
+ Misses        577      575       -2

Impacted Files	Coverage Δ
src/gluonnlp/data/dataloader.py	`83.62% <0%> (-0.87%)`	⬇️
src/gluonnlp/data/utils.py	`78.41% <0%> (+2.15%)`	⬆️

mli · 2019-02-25T04:55:20Z

Job PR-612/2 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-612/2/index.html

mli · 2019-02-26T08:36:23Z

Job PR-612/3 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-612/3/index.html

fierceX

It may be better to only calculate the effective length of out and tag_ids.
This is the case in my implementation:

max_valid_length = max(valid_length).asscalar()
out = out[:, 1:max_valid_length-1, :]
label = label[:, 1:max_valid_length-1]
loss_value = loss_function(out, label.astype('float32').as_in_context(ctx)).mean()

fierceX · 2019-02-27T09:46:45Z

scripts/named_entity_recognition/code/train_bert_ner.py

+        v.wd_mult = 0.0
+    params = [p for p in net.collect_params().values() if p.grad_req != 'null']
+
+    def train(data_loader):


It may be better to only calculate the effective length of out and tag_ids.
This is the case in my implementation:

max_valid_length = max(valid_length).asscalar() out = out[:, 1:max_valid_length-1, :] label = label[:, 1:max_valid_length-1] loss_value = loss_function(out, label.astype('float32').as_in_context(ctx)).mean()

Do you mean it would be computationally more efficient, because the loss will be computed on smaller number of tags? If you are talking about correctness, I am providing sample weights to zero-out loss values on tokens which we don't have to predict: https://github.com/dmlc/gluon-nlp/pull/612/files/cf7cbc87e3cb2ed83cac6fa3e065579ee0a98540#diff-aa85770855ab2b6db7a5b29c232749efR138

mli · 2019-03-10T13:44:10Z

Job PR-612/1 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-612/1/index.html

mli · 2019-04-03T18:02:24Z

Job PR-612/6 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-612/6/index.html

hankcs

Great, I have a couple of notes there. Most concerns are related to the CRF code, which is actually not used in BERT model. You can ignore them since CRF is not the main topic.

scripts/named_entity_recognition/code/algorithms/cnn_bilstm_crf.py

scripts/named_entity_recognition/code/algorithms/crf.py

scripts/named_entity_recognition/code/algorithms/cnn_bilstm_crf.py

scripts/named_entity_recognition/code/__init__.py

mli · 2019-04-14T11:00:48Z

Job PR-612/7 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-612/7/index.html

bikestra · 2019-04-15T18:10:35Z

@hankcs Thanks for your comments! Most of the comments seem to be from @kenjewu 's change here #466 . I created a new PR #612 based on #466 , but since #466 has been open for a while, the situation is becoming awkward. Also, this PR uses very few functions from #466 , just a handful of data processing utility functions.

Here are some options:

Make this part of BERT scripts https://github.com/dmlc/gluon-nlp/tree/master/scripts/bert .
- Copy & paste [SCRIPT] Added script for named entity recognition experiments #466 's data processing utility functions to here
- Or, make these utility functions part of https://github.com/dmlc/gluon-nlp/blob/master/src/gluonnlp/data/conll.py
Review CR [SCRIPT] Added script for named entity recognition experiments #466 and push it first

@szha Any thoughts?

szha · 2019-04-15T18:35:40Z

@bikestra let's go with the first option, treat the all utility functions required here as new functionality in review standard, and try to merge this PR first.

bikestra · 2019-04-16T22:54:04Z

@szha sounds good. I will proceed in that direction.

bikestra · 2019-04-29T05:24:29Z

I removed code from the previous named entity recognition code by @kenjewu , so we don't have irrelevant code anymore.

hankcs

Looks good to me. Shall we design an interface for normal users, which takes a string as input and output a list of NE in it?

bikestra · 2019-04-30T23:30:22Z

@hankcs that would be fun to have. I will add it :)

szha · 2019-05-07T22:30:54Z

scripts/bert/index.rst

+        --optimizer bertadam --bert-model bert_24_1024_16 \
+        --save-checkpoint-prefix ${MODEL_DIR}/large_bert --seed 13531
+
+This achieves Test F1 from `91.5` to `92.2`.


@bikestra could you upload a training log for this to dmlc/web-data and link it here?

eric-haibin-lin · 2019-05-10T22:21:19Z

@bikestra the CI is recently updated. Would you mind syncing with master? Thanks!

bikestra · 2019-05-13T19:32:18Z

Will add the log, and address code style problems per http://ci.mxnet.io/blue/organizations/jenkins/gluon-nlp/detail/PR-612/12/pipeline . @szha suggested that providing command line interface may not be within the scope of this PR, so will wrap this PR without the feature.

mli · 2019-05-13T21:05:20Z

Job PR-612/1 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-612/1/index.html

eric-haibin-lin

@bikestra do you think it's possible to add some test for the script, so that the functionality is not broken in the future? For example, training model with dev set for 1 epoch.
We have a collections of such tests in https://github.com/dmlc/gluon-nlp/blob/master/scripts/tests/test_scripts.py

…on str2bool()

mli · 2019-05-14T00:38:30Z

Job PR-612/2 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-612/2/index.html

bikestra · 2019-05-14T04:42:32Z

@eric-haibin-lin I really like the idea of adding some tests, but CoNLL datasets are not public. Popular datasets for NER are not public, but maybe we can add something like WikiNER https://github.com/dice-group/FOX/tree/master/input/Wikiner , which is CC-BY-30.

scripts/bert/ner_model.py

mli · 2019-06-02T20:29:56Z

Job PR-612/3 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-612/3/index.html

mli · 2019-06-02T21:20:10Z

Job PR-612/4 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-612/4/index.html

mli · 2019-06-02T21:40:40Z

Job PR-612/5 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-612/5/index.html

mli · 2019-06-03T00:30:43Z

Job PR-612/7 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-612/7/index.html

* add BERT model for NER * addressed pylint complaints * removed syntax error on double quotes * address code style complains from lint * make the code compatible with python 2.7 by removing type annotation on str2bool() * rename file and update readme * fix lint (round 1) * fix lint (round 2) * fix lint * fix import

bikestra requested a review from szha as a code owner February 23, 2019 06:42

bikestra requested a review from fierceX February 23, 2019 06:43

eric-haibin-lin requested a review from vanewu February 23, 2019 07:22

fierceX reviewed Feb 27, 2019

View reviewed changes

hankcs mentioned this pull request Mar 22, 2019

[BERT] BERT for Named Entity Recognition #593

Open

szha requested a review from hankcs April 5, 2019 20:47

hankcs reviewed Apr 8, 2019

View reviewed changes

hankcs reviewed Apr 9, 2019

View reviewed changes

scripts/named_entity_recognition/code/__init__.py Outdated Show resolved Hide resolved

szha mentioned this pull request Apr 23, 2019

[Discussion] Roadmap #654

Closed

66 tasks

add BERT model for NER

e699d41

bikestra force-pushed the bertner branch from e7ca27f to e699d41 Compare April 29, 2019 05:15

bikestra and others added 4 commits April 28, 2019 22:25

Merge branch 'master' into bertner

551aa26

addressed pylint complaints

2358e4c

Merge branch 'bertner' of github.com:bikestra/gluon-nlp into bertner

fe5cf57

removed syntax error on double quotes

97587e5

hankcs reviewed Apr 30, 2019

View reviewed changes

szha reviewed May 7, 2019

View reviewed changes

eric-haibin-lin mentioned this pull request May 9, 2019

Want to add bi-lstm CRF model #100

Open

eric-haibin-lin added the release focus Progress focus for release label May 10, 2019

address code style complains from lint

7bf7e2c

eric-haibin-lin reviewed May 13, 2019

View reviewed changes

make the code compatible with python 2.7 by removing type annotation …

a5a96d4

…on str2bool()

bikestra commented May 14, 2019

View reviewed changes

scripts/bert/ner_model.py Show resolved Hide resolved

EC2 Default User added 5 commits June 2, 2019 18:18

merge with master

3727a3a

rename file and update readme

16edeb4

fix lint (round 1)

fce4210

fix lint (round 2)

6431a88

fix lint

555a437

fix import

00b81fe

szha approved these changes Jun 3, 2019

View reviewed changes

eric-haibin-lin approved these changes Jun 3, 2019

View reviewed changes

eric-haibin-lin merged commit 64030ab into dmlc:master Jun 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MODEL] BERT based NER #612

[MODEL] BERT based NER #612

bikestra commented Feb 23, 2019 •

edited by eric-haibin-lin

Loading

mli commented Feb 23, 2019

codecov bot commented Feb 25, 2019

codecov bot commented Feb 25, 2019 •

edited

Loading

mli commented Feb 25, 2019

mli commented Feb 26, 2019

fierceX left a comment

fierceX Feb 27, 2019

bikestra Feb 27, 2019

mli commented Mar 10, 2019

mli commented Apr 3, 2019

hankcs left a comment

mli commented Apr 14, 2019

bikestra commented Apr 15, 2019

szha commented Apr 15, 2019

bikestra commented Apr 16, 2019

bikestra commented Apr 29, 2019

hankcs left a comment

bikestra commented Apr 30, 2019

szha May 7, 2019 •

edited

Loading

eric-haibin-lin commented May 10, 2019

bikestra commented May 13, 2019

mli commented May 13, 2019

eric-haibin-lin left a comment

mli commented May 14, 2019

bikestra commented May 14, 2019

mli commented Jun 2, 2019

mli commented Jun 2, 2019

mli commented Jun 2, 2019

mli commented Jun 3, 2019

[MODEL] BERT based NER #612

[MODEL] BERT based NER #612

Conversation

bikestra commented Feb 23, 2019 • edited by eric-haibin-lin Loading

Description

Checklist

Essentials

Changes

Comments

mli commented Feb 23, 2019

codecov bot commented Feb 25, 2019

Codecov Report

codecov bot commented Feb 25, 2019 • edited Loading

Codecov Report

mli commented Feb 25, 2019

mli commented Feb 26, 2019

fierceX left a comment

Choose a reason for hiding this comment

fierceX Feb 27, 2019

Choose a reason for hiding this comment

bikestra Feb 27, 2019

Choose a reason for hiding this comment

mli commented Mar 10, 2019

mli commented Apr 3, 2019

hankcs left a comment

Choose a reason for hiding this comment

mli commented Apr 14, 2019

bikestra commented Apr 15, 2019

szha commented Apr 15, 2019

bikestra commented Apr 16, 2019

bikestra commented Apr 29, 2019

hankcs left a comment

Choose a reason for hiding this comment

bikestra commented Apr 30, 2019

szha May 7, 2019 • edited Loading

Choose a reason for hiding this comment

eric-haibin-lin commented May 10, 2019

bikestra commented May 13, 2019

mli commented May 13, 2019

eric-haibin-lin left a comment

Choose a reason for hiding this comment

mli commented May 14, 2019

bikestra commented May 14, 2019

mli commented Jun 2, 2019

mli commented Jun 2, 2019

mli commented Jun 2, 2019

mli commented Jun 3, 2019

bikestra commented Feb 23, 2019 •

edited by eric-haibin-lin

Loading

codecov bot commented Feb 25, 2019 •

edited

Loading

szha May 7, 2019 •

edited

Loading