We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BertWordPieceEncoder.index_datasets 一步实现了文本到 token_id 的转换,但是文本原来标签的序列标注标签似乎没有同步转换的对应方法?
这会导致一个问题,用户只能不用 index_datasets 方法,只能自己写 tokenize + 原始标签到 wordpiece 序列对应的ner标签的方法。 所以,这个BertWordPieceEncoder似乎不太方便用于ner?特别是英文的 wordpiece 的那种?
The text was updated successfully, but these errors were encountered:
建议直接使用BertEmbedding,就不需要考虑这个问题了【不过会存在一点效率上的损失】。BertWordPieceEncoder是给分类任务使用的。
Sorry, something went wrong.
No branches or pull requests
BertWordPieceEncoder.index_datasets 一步实现了文本到 token_id 的转换,但是文本原来标签的序列标注标签似乎没有同步转换的对应方法?
这会导致一个问题,用户只能不用 index_datasets 方法,只能自己写 tokenize + 原始标签到 wordpiece 序列对应的ner标签的方法。
所以,这个BertWordPieceEncoder似乎不太方便用于ner?特别是英文的 wordpiece 的那种?
The text was updated successfully, but these errors were encountered: