Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allennlp预测SRL结果不一致 #12

Closed
deanyan7 opened this issue Jun 27, 2020 · 7 comments
Closed

Allennlp预测SRL结果不一致 #12

deanyan7 opened this issue Jun 27, 2020 · 7 comments

Comments

@deanyan7
Copy link

您好,当我直接使用原始数据进行SRL预测时,所得到的结果与您提供的测试样本不一致

如 The new rights are nice enough

样本测试所给的结果是 {"verbs": [{"verb": "are", "description": "[ARG1: The new rights] [V: are] [ARG2: nice enough]", "tags": ["B-ARG1", "I-ARG1", "I-ARG1", "B-V", "B-ARG2", "I-ARG2"]}], "words": ["The", "new", "rights", "are", "nice", "enough"]}

而allennlp预测出来的结果是 [{'verbs': [], 'words': ['The', 'new', 'rights', 'are', 'nice', 'enough']}]

allennlp 0.8.1 allennlp-models=1.0.0
也测试过 allennlp 1.0.0 allennlp-models=1.0.0

@cooelf
Copy link
Owner

cooelf commented Jun 27, 2020

这个预测结果似乎模型没有有效执行,印象中没有正确识别动词的话会出现全空的情况。
这个是使用的提供的数据处理吗?(online or offline版本?)请提供详细的操作流程以便重现下。

@deanyan7
Copy link
Author

两个版本均是一样的结果,这只是其中的一个样本,无法正确的预测类似do、is、was、were等类型的系动词或者辅助动词,但是对于其他的动词,结果感觉还是挺好的,稍后我会将具体的流程提供以便复现

@cooelf
Copy link
Owner

cooelf commented Jun 27, 2020

AllenNLP的动词是通过spacy识别的。确认下你现在用的是之前的ELMo模型还是BERT?

allennlp基于BERT的demo的确也识别不了类似动词(https://demo.allennlp.org/semantic-role-labeling)。

Related Issue: allenai/allennlp#4146

@deanyan7
Copy link
Author

deanyan7 commented Jun 27, 2020

pytorch版本为1.5.0
我采用了您提供的 srl-model-2018.05.25.tar.gz,allennlp==0.8.1 spacy==2.2.4 也采用了allennlp-demo提供的bert-base-srl-2020.03.24.tar.gz 在 allennlp==1.0.0 allennlp==1.0.0 均出现此类问题

@deanyan7
Copy link
Author

deanyan7 commented Jun 27, 2020

复现:
allennlp==0.8.1 spacy==2.2.4
from allennlp.models import load_archive
from allennlp.predictors import Predictor
archive = load_archive("/model/srl-model-2018.05.25.tar.gz",cuda_device=0)
predictor = Predictor.from_archive(archive)
predictor.predict(sentence)

或者
allennlp==1.0.0 allennlp-models==1.0.0
from allennlp.models import load_archive
from allennlp.predictors import Predictor
archive = load_archive("model/bert-base-srl-2020.03.24.tar.gz",cuda_device=0)
predictor = Predictor.from_archive(archive)
predictor.predict(sentence)

sentence = "yeah i know and i did that all through college and it worked too"
result = {'verbs': [{'verb': 'know', 'description': 'yeah [ARG0: i] [V: know] and i did that all through college and it worked too', 'tags': ['O', 'B-ARG0', 'B-V', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']},
{'verb': 'worked', 'description': 'yeah i know and i did that all through college and [ARG1: it] [V: worked] [ARGM-ADV: too]', 'tags': ['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-ARG1', 'B-V', 'B-ARGM-ADV']}], 'words': ['yeah', 'i', 'know', 'and', 'i', 'did', 'that', 'all', 'through', 'college', 'and', 'it', 'worked', 'too']}

样本结果:
{"verbs": [{"verb": "know", "description": "yeah [ARG0: i] [V: know] and i did that all through college and it worked too", "tags": ["O", "B-ARG0", "B-V", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O"]},
{"verb": "did", "description": "yeah i know and [ARG0: i] [V: did] [ARG1: that] [ARGM-TMP: all through college] and it worked too", "tags": ["O", "O", "O", "O", "B-ARG0", "B-V", "B-ARG1", "B-ARGM-TMP", "I-ARGM-TMP", "I-ARGM-TMP", "O", "O", "O", "O"]},
{"verb": "worked", "description": "yeah i know and i did that all through college and [ARG0: it] [V: worked] [ARGM-ADV: too]", "tags": ["O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "B-ARG0", "B-V", "B-ARGM-ADV"]}], "words": ["yeah", "i", "know", "and", "i", "did", "that", "all", "through", "college", "and", "it", "worked", "too"]}

@cooelf
Copy link
Owner

cooelf commented Jun 27, 2020

我试了下不同spacy的版本在给出verb标签的时候有些区别,可能导致了SRL模型对谓词的识别问题。可以换成早期的spacy的版本(如2.0.18,并重新安装python -m spacy download en_core_web_sm)

参考 allenai/allennlp#3418

测试样例:

import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("The new rights are nice enough")
print([token.text for token in doc])
print([token.pos_ for token in doc])

以下是具体的输出对比:

spacy 2.0.18
['yeah', 'i', 'know', 'and', 'i', 'did', 'that', 'all', 'through', 'college', 'and', 'it', 'worked', 'too']
['INTJ', 'PRON', 'VERB', 'CCONJ', 'PRON', 'VERB', 'DET', 'DET', 'ADP', 'NOUN', 'CCONJ', 'PRON', 'VERB', 'ADV']

['The', 'new', 'rights', 'are', 'nice', 'enough']
['DET', 'ADJ', 'NOUN', 'VERB', 'ADJ', 'ADV']

spacy 2.2.4
['The', 'new', 'rights', 'are', 'nice', 'enough']
['DET', 'ADJ', 'NOUN', 'AUX', 'ADJ', 'ADV']

['yeah', 'i', 'know', 'and', 'i', 'did', 'that', 'all', 'through', 'college', 'and', 'it', 'worked', 'too']
['INTJ', 'PRON', 'VERB', 'CCONJ', 'PRON', 'AUX', 'SCONJ', 'DET', 'ADP', 'NOUN', 'CCONJ', 'PRON', 'VERB', 'ADV']

@deanyan7
Copy link
Author

非常感谢您的帮助,此问题已解决,谢谢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants