-
Notifications
You must be signed in to change notification settings - Fork 538
Conversation
Job PR-612/1 is complete. |
Codecov Report
@@ Coverage Diff @@
## master #612 +/- ##
=======================================
Coverage 67.07% 67.07%
=======================================
Files 140 140
Lines 12423 12423
=======================================
Hits 8333 8333
Misses 4090 4090
|
Codecov Report
@@ Coverage Diff @@
## master #612 +/- ##
==========================================
+ Coverage 90.5% 90.53% +0.03%
==========================================
Files 65 65
Lines 6076 6076
==========================================
+ Hits 5499 5501 +2
+ Misses 577 575 -2
|
Job PR-612/2 is complete. |
Job PR-612/3 is complete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be better to only calculate the effective length of out
and tag_ids
.
This is the case in my implementation:
max_valid_length = max(valid_length).asscalar()
out = out[:, 1:max_valid_length-1, :]
label = label[:, 1:max_valid_length-1]
loss_value = loss_function(out, label.astype('float32').as_in_context(ctx)).mean()
v.wd_mult = 0.0 | ||
params = [p for p in net.collect_params().values() if p.grad_req != 'null'] | ||
|
||
def train(data_loader): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be better to only calculate the effective length of out
and tag_ids
.
This is the case in my implementation:
max_valid_length = max(valid_length).asscalar()
out = out[:, 1:max_valid_length-1, :]
label = label[:, 1:max_valid_length-1]
loss_value = loss_function(out, label.astype('float32').as_in_context(ctx)).mean()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean it would be computationally more efficient, because the loss will be computed on smaller number of tags? If you are talking about correctness, I am providing sample weights to zero-out loss values on tokens which we don't have to predict: https://github.com/dmlc/gluon-nlp/pull/612/files/cf7cbc87e3cb2ed83cac6fa3e065579ee0a98540#diff-aa85770855ab2b6db7a5b29c232749efR138
Job PR-612/1 is complete. |
Job PR-612/6 is complete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, I have a couple of notes there. Most concerns are related to the CRF code, which is actually not used in BERT model. You can ignore them since CRF is not the main topic.
scripts/named_entity_recognition/code/algorithms/cnn_bilstm_crf.py
Outdated
Show resolved
Hide resolved
scripts/named_entity_recognition/code/algorithms/cnn_bilstm_crf.py
Outdated
Show resolved
Hide resolved
scripts/named_entity_recognition/code/algorithms/cnn_bilstm_crf.py
Outdated
Show resolved
Hide resolved
Job PR-612/7 is complete. |
@hankcs Thanks for your comments! Most of the comments seem to be from @kenjewu 's change here #466 . I created a new PR #612 based on #466 , but since #466 has been open for a while, the situation is becoming awkward. Also, this PR uses very few functions from #466 , just a handful of data processing utility functions. Here are some options:
@szha Any thoughts? |
@bikestra let's go with the first option, treat the all utility functions required here as new functionality in review standard, and try to merge this PR first. |
@szha sounds good. I will proceed in that direction. |
I removed code from the previous named entity recognition code by @kenjewu , so we don't have irrelevant code anymore. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Shall we design an interface for normal users, which takes a string as input and output a list of NE in it?
@hankcs that would be fun to have. I will add it :) |
scripts/bert/index.rst
Outdated
--optimizer bertadam --bert-model bert_24_1024_16 \ | ||
--save-checkpoint-prefix ${MODEL_DIR}/large_bert --seed 13531 | ||
|
||
This achieves Test F1 from `91.5` to `92.2`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bikestra could you upload a training log for this to dmlc/web-data and link it here?
@bikestra the CI is recently updated. Would you mind syncing with master? Thanks! |
Will add the log, and address code style problems per http://ci.mxnet.io/blue/organizations/jenkins/gluon-nlp/detail/PR-612/12/pipeline . @szha suggested that providing command line interface may not be within the scope of this PR, so will wrap this PR without the feature. |
Job PR-612/1 is complete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bikestra do you think it's possible to add some test for the script, so that the functionality is not broken in the future? For example, training model with dev set for 1 epoch.
We have a collections of such tests in https://github.com/dmlc/gluon-nlp/blob/master/scripts/tests/test_scripts.py
Job PR-612/2 is complete. |
@eric-haibin-lin I really like the idea of adding some tests, but CoNLL datasets are not public. Popular datasets for NER are not public, but maybe we can add something like WikiNER https://github.com/dice-group/FOX/tree/master/input/Wikiner , which is CC-BY-30. |
Job PR-612/3 is complete. |
Job PR-612/4 is complete. |
Job PR-612/5 is complete. |
Job PR-612/7 is complete. |
* add BERT model for NER * addressed pylint complaints * removed syntax error on double quotes * address code style complains from lint * make the code compatible with python 2.7 by removing type annotation on str2bool() * rename file and update readme * fix lint (round 1) * fix lint (round 2) * fix lint * fix import
Description
Early implementation of BERT based NER built on top of @kenjewu 's work. Not ready to be merged in, but per discussion in #593 I wanted to share the current state so that @fierceX , @parry2403 and others can chime in.
Note that current code depends on
seqeval
package, which I am not sure we should take dependency on.Checklist
Essentials
Changes
seqeval
Comments
The best Test F1 score I got on CoNLL 2003 English is only 90.02, with parameters I put as default.