Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

[MODEL] BERT based NER #612

Merged
merged 13 commits into from
Jun 3, 2019
Merged

[MODEL] BERT based NER #612

merged 13 commits into from
Jun 3, 2019

Conversation

bikestra
Copy link
Contributor

@bikestra bikestra commented Feb 23, 2019

Description

Early implementation of BERT based NER built on top of @kenjewu 's work. Not ready to be merged in, but per discussion in #593 I wanted to share the current state so that @fierceX , @parry2403 and others can chime in.

Note that current code depends on seqeval package, which I am not sure we should take dependency on.

Checklist

Essentials

  • PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

Changes

  • Implement BERT model and basic training loop
  • [?] Reproduce BERT paper's results
  • Clean up code for training loop, add more logs
  • Remove dependency on seqeval

Comments

  • The best Test F1 score I got on CoNLL 2003 English is only 90.02, with parameters I put as default.

@mli
Copy link
Member

mli commented Feb 23, 2019

Job PR-612/1 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-612/1/index.html

@codecov
Copy link

codecov bot commented Feb 25, 2019

Codecov Report

Merging #612 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #612   +/-   ##
=======================================
  Coverage   67.07%   67.07%           
=======================================
  Files         140      140           
  Lines       12423    12423           
=======================================
  Hits         8333     8333           
  Misses       4090     4090
Flag Coverage Δ
#PR587 67.33% <0%> (ø) ⬆️
#PR612 66.74% <0%> (ø) ⬆️
#master 67.13% <0%> (ø) ⬆️
#notserial 43.63% <0%> (ø) ⬆️
#py2 67.44% <0%> (ø) ⬆️
#py3 66.91% <0%> (ø) ⬆️
#serial 52.41% <0%> (ø) ⬆️

@codecov
Copy link

codecov bot commented Feb 25, 2019

Codecov Report

Merging #612 into master will increase coverage by 0.03%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #612      +/-   ##
==========================================
+ Coverage    90.5%   90.53%   +0.03%     
==========================================
  Files          65       65              
  Lines        6076     6076              
==========================================
+ Hits         5499     5501       +2     
+ Misses        577      575       -2
Impacted Files Coverage Δ
src/gluonnlp/data/dataloader.py 83.62% <0%> (-0.87%) ⬇️
src/gluonnlp/data/utils.py 78.41% <0%> (+2.15%) ⬆️

@mli
Copy link
Member

mli commented Feb 25, 2019

Job PR-612/2 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-612/2/index.html

@mli
Copy link
Member

mli commented Feb 26, 2019

Job PR-612/3 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-612/3/index.html

Copy link
Member

@fierceX fierceX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be better to only calculate the effective length of out and tag_ids.
This is the case in my implementation:

max_valid_length = max(valid_length).asscalar()
out = out[:, 1:max_valid_length-1, :]
label = label[:, 1:max_valid_length-1]
loss_value = loss_function(out, label.astype('float32').as_in_context(ctx)).mean()

v.wd_mult = 0.0
params = [p for p in net.collect_params().values() if p.grad_req != 'null']

def train(data_loader):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be better to only calculate the effective length of out and tag_ids.
This is the case in my implementation:

max_valid_length = max(valid_length).asscalar()
out = out[:, 1:max_valid_length-1, :]
label = label[:, 1:max_valid_length-1]
loss_value = loss_function(out, label.astype('float32').as_in_context(ctx)).mean()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean it would be computationally more efficient, because the loss will be computed on smaller number of tags? If you are talking about correctness, I am providing sample weights to zero-out loss values on tokens which we don't have to predict: https://github.com/dmlc/gluon-nlp/pull/612/files/cf7cbc87e3cb2ed83cac6fa3e065579ee0a98540#diff-aa85770855ab2b6db7a5b29c232749efR138

@mli
Copy link
Member

mli commented Mar 10, 2019

Job PR-612/1 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-612/1/index.html

@mli
Copy link
Member

mli commented Apr 3, 2019

Job PR-612/6 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-612/6/index.html

@szha szha requested a review from hankcs April 5, 2019 20:47
Copy link
Contributor

@hankcs hankcs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, I have a couple of notes there. Most concerns are related to the CRF code, which is actually not used in BERT model. You can ignore them since CRF is not the main topic.

@mli
Copy link
Member

mli commented Apr 14, 2019

Job PR-612/7 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-612/7/index.html

@bikestra
Copy link
Contributor Author

@hankcs Thanks for your comments! Most of the comments seem to be from @kenjewu 's change here #466 . I created a new PR #612 based on #466 , but since #466 has been open for a while, the situation is becoming awkward. Also, this PR uses very few functions from #466 , just a handful of data processing utility functions.

Here are some options:

@szha Any thoughts?

@szha
Copy link
Member

szha commented Apr 15, 2019

@bikestra let's go with the first option, treat the all utility functions required here as new functionality in review standard, and try to merge this PR first.

@bikestra
Copy link
Contributor Author

@szha sounds good. I will proceed in that direction.

@szha szha mentioned this pull request Apr 23, 2019
66 tasks
@bikestra
Copy link
Contributor Author

I removed code from the previous named entity recognition code by @kenjewu , so we don't have irrelevant code anymore.

Copy link
Contributor

@hankcs hankcs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Shall we design an interface for normal users, which takes a string as input and output a list of NE in it?

@bikestra
Copy link
Contributor Author

@hankcs that would be fun to have. I will add it :)

--optimizer bertadam --bert-model bert_24_1024_16 \
--save-checkpoint-prefix ${MODEL_DIR}/large_bert --seed 13531

This achieves Test F1 from `91.5` to `92.2`.
Copy link
Member

@szha szha May 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bikestra could you upload a training log for this to dmlc/web-data and link it here?

@eric-haibin-lin eric-haibin-lin added the release focus Progress focus for release label May 10, 2019
@eric-haibin-lin
Copy link
Member

@bikestra the CI is recently updated. Would you mind syncing with master? Thanks!

@bikestra
Copy link
Contributor Author

Will add the log, and address code style problems per http://ci.mxnet.io/blue/organizations/jenkins/gluon-nlp/detail/PR-612/12/pipeline . @szha suggested that providing command line interface may not be within the scope of this PR, so will wrap this PR without the feature.

@mli
Copy link
Member

mli commented May 13, 2019

Job PR-612/1 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-612/1/index.html

Copy link
Member

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bikestra do you think it's possible to add some test for the script, so that the functionality is not broken in the future? For example, training model with dev set for 1 epoch.
We have a collections of such tests in https://github.com/dmlc/gluon-nlp/blob/master/scripts/tests/test_scripts.py

@mli
Copy link
Member

mli commented May 14, 2019

Job PR-612/2 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-612/2/index.html

@bikestra
Copy link
Contributor Author

@eric-haibin-lin I really like the idea of adding some tests, but CoNLL datasets are not public. Popular datasets for NER are not public, but maybe we can add something like WikiNER https://github.com/dice-group/FOX/tree/master/input/Wikiner , which is CC-BY-30.

@mli
Copy link
Member

mli commented Jun 2, 2019

Job PR-612/3 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-612/3/index.html

@mli
Copy link
Member

mli commented Jun 2, 2019

Job PR-612/4 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-612/4/index.html

@mli
Copy link
Member

mli commented Jun 2, 2019

Job PR-612/5 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-612/5/index.html

@mli
Copy link
Member

mli commented Jun 3, 2019

Job PR-612/7 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-612/7/index.html

@eric-haibin-lin eric-haibin-lin merged commit 64030ab into dmlc:master Jun 3, 2019
paperplanet pushed a commit to paperplanet/gluon-nlp that referenced this pull request Jun 9, 2019
* add BERT model for NER

* addressed pylint complaints

* removed syntax error on double quotes

* address code style complains from lint

* make the code compatible with python 2.7 by removing type annotation on str2bool()

* rename file and update readme

* fix lint (round 1)

* fix lint (round 2)

* fix lint

* fix import
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
release focus Progress focus for release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants