Skip to content
This repository has been archived by the owner on Apr 23, 2024. It is now read-only.

Controlling word tokenization #73

Open
MexicanMan opened this issue Aug 20, 2020 · 0 comments
Open

Controlling word tokenization #73

MexicanMan opened this issue Aug 20, 2020 · 0 comments

Comments

@MexicanMan
Copy link

I would like to know if there is any possibility to control the splitting of a word into tokens, besides setting up bpe dropout?
For example: "Best" can be tokenized into ['▁Best'] or ['▁Be', 'st'] and etc. Can I choose which one tokens to use?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant