Skip to content
This repository has been archived by the owner on Apr 23, 2024. It is now read-only.

Vocabulary contains underscore multiple times? #68

Open
RuABraun opened this issue Apr 20, 2020 · 0 comments
Open

Vocabulary contains underscore multiple times? #68

RuABraun opened this issue Apr 20, 2020 · 0 comments

Comments

@RuABraun
Copy link

After training if I write out the vocabulary:

for w in bpe.vocab():
    fh.write(f'{w}\n')  // fh is filehandler

and then look inside the file this is (a subset) of what I see:

_
8
-
7
3
6
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_

Why is this?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant