Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bert wordpiece indexing之后, 原始单词对应的 序列标注标签怎么做映射? #412

Open
312shan opened this issue May 2, 2022 · 1 comment

Comments

@312shan
Copy link

312shan commented May 2, 2022

BertWordPieceEncoder.index_datasets 一步实现了文本到 token_id 的转换,但是文本原来标签的序列标注标签似乎没有同步转换的对应方法?

这会导致一个问题,用户只能不用 index_datasets 方法,只能自己写 tokenize + 原始标签到 wordpiece 序列对应的ner标签的方法。
所以,这个BertWordPieceEncoder似乎不太方便用于ner?特别是英文的 wordpiece 的那种?

@yhcc
Copy link
Member

yhcc commented May 2, 2022

建议直接使用BertEmbedding,就不需要考虑这个问题了【不过会存在一点效率上的损失】。BertWordPieceEncoder是给分类任务使用的。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants