Skip to content

Tokenizer 1.24.0

Compare
Choose a tag to compare
@guillaumekln guillaumekln released this 16 Feb 16:53
· 113 commits to master since this release

New features

  • Add verbose flag in file tokenization APIs to log progress every 100,000 lines
  • [Python] Add options property to Tokenizer instances
  • [Python] Add class pyonmttok.SentencePieceTokenizer to help creating a tokenizer compatible with SentencePiece

Fixes and improvements

  • Fix deserialization into Token objects that was sometimes incorrect
  • Fix Windows compilation
  • Fix Google Test integration that was sometimes installed as part of make install
  • [Python] Update pybind11 to 2.6.2
  • [Python] Update ICU to 66.1
  • [Python] Compile ICU with optimization flags