Skip to content

Tokenizer 1.31.0

Compare
Choose a tag to compare
@guillaumekln guillaumekln released this 07 Mar 10:10
· 44 commits to master since this release

New features

  • Add utilities to build and use vocabularies:
    • pyonmttok.Vocab
    • pyonmttok.build_vocab_from_tokens
    • pyonmttok.build_vocab_from_lines
  • Define the method Tokenizer.__call__ to simplify the tokenizer usage when additional features are unused:
tokens = tokenizer(text)

Fixes and improvements

  • Update pybind11 to 2.9.1