使用自定义vocab.txt #32

JokerCD · 2023-02-01T03:38:52Z

作者好，感谢你的分享！
在按你的步骤进行操作时出现了一个问题：当使用我自定义的vocab.txt时，在执行了init_custdata_model.py文件后发现生成的配置文件中tokenizer.json文件还是原来的字库，并没有更新至我自定义的字库，导致调用processor.tokenizer.get_vocab()时得到的是原字库，而这影响到了训练和测试时的encode和decode。
期待你的回答，再次感谢！

wenlihaoyu · 2023-02-01T12:51:20Z

先执行 python gen_vocab.py获取字典

JokerCD · 2023-02-02T01:42:58Z

先执行 python gen_vocab.py获取字典

您好，
我这边是先执行了gen_vocab.py基于我自定义的字库获取了字典

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

使用自定义vocab.txt #32

使用自定义vocab.txt #32

JokerCD commented Feb 1, 2023

wenlihaoyu commented Feb 1, 2023

JokerCD commented Feb 2, 2023

使用自定义vocab.txt #32

使用自定义vocab.txt #32

Comments

JokerCD commented Feb 1, 2023

wenlihaoyu commented Feb 1, 2023

JokerCD commented Feb 2, 2023