Skip to content
This repository has been archived by the owner on Apr 23, 2024. It is now read-only.

Add an option to predefine special tokens #44

Open
Kyeongpil opened this issue Nov 15, 2019 · 4 comments
Open

Add an option to predefine special tokens #44

Kyeongpil opened this issue Nov 15, 2019 · 4 comments
Labels
enhancement New feature or request

Comments

@Kyeongpil
Copy link

I want to use this yttm model.

However, I want to add [MASK] token to the vocabulary.

In this case, How can I predefine special tokens?

@yutkin yutkin added the question Further information is requested label Nov 15, 2019
@xbelonogov
Copy link
Contributor

We don't support this feature right now. Maybe it will be added later.

You can use the following workaround. Just append this special token in the end of your training text many times (for example 1000). In this case the model will definitely add it to vocabulary.

@Kyeongpil
Copy link
Author

@xbelonogov Thank you for your kind response.

I'll be looking forward this feature to be added.
Because many language models such as BERT needs special tokens such as cls, sep, mask.

@kalaidin kalaidin changed the title How can I predefine special tokens? Add an option to predefine special tokens Nov 19, 2019
@kalaidin kalaidin added enhancement New feature or request and removed question Further information is requested labels Nov 19, 2019
@glample
Copy link

glample commented Sep 11, 2020

We would also find this option very useful 👍 The workaround to have 1000 special tokens at the beginning is fine if there are just a few tokens, but is inconvenient when we want to provide a large set of words that should never be split.

@miguelvictor
Copy link

Any progress on this ? 😊

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants