Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用自定义vocab.txt #32

Open
JokerCD opened this issue Feb 1, 2023 · 2 comments
Open

使用自定义vocab.txt #32

JokerCD opened this issue Feb 1, 2023 · 2 comments

Comments

@JokerCD
Copy link

JokerCD commented Feb 1, 2023

作者好,感谢你的分享!
在按你的步骤进行操作时出现了一个问题:当使用我自定义的vocab.txt时,在执行了init_custdata_model.py文件后发现生成的配置文件中tokenizer.json文件还是原来的字库,并没有更新至我自定义的字库,导致调用processor.tokenizer.get_vocab()时得到的是原字库,而这影响到了训练和测试时的encode和decode。
期待你的回答,再次感谢!

@wenlihaoyu
Copy link
Member

先执行 python gen_vocab.py获取字典

@JokerCD
Copy link
Author

JokerCD commented Feb 2, 2023

先执行 python gen_vocab.py获取字典

您好,
我这边是先执行了gen_vocab.py基于我自定义的字库获取了字典

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants