Releases: segment-any-text/wtpsplit
Releases · segment-any-text/wtpsplit
Release 2.0.5
- Fixes potential CUDA device error when the input has exactly 511 tokens (#121).
Release 2.0.4
- Fix a speed issue with SaT (#118). Now it is (as expected) ~6x faster than WtP.
Release 2.0.3
Implement SaT (https://arxiv.org/abs/2406.16678) and switch the default models to SaT🚀
The previous WtP models are still available but SaT is strictly better in accuracy and speed. See the updated README for details: https://github.com/segment-any-text/wtpsplit.
SaT was implemented and developed by @markus583 @igorsterner.
Release 1.3.0
- Fix a bug affecting some hash embeddings of the
canine-*
models which reduced accuracy (please upgrade to this version!). - Add a guide on adapting to your custom data: https://github.com/bminixhofer/wtpsplit#advanced-usage.
Release 1.2.3
- fix error with text where length is not a multiple of 4 and shorter than 512 characters in
canine-s-*
models (#98).
Release 1.2.2
- add
strip_whitespace
flag. - fix bug with some zero-length sentences being returned if there is lots of trailing whitespace.
Release 1.2.1
Release 1.2.0
- Speed up pre- & postprocessing via better vectorization (#94).
- Proper onnxruntime support for the
wtp-bert-*
models, although onnx models are currently not much faster (or even slower) than PyTorch models for some reason. Will continue to look into that. - Adds missing
pandas
requirement (fixing #92). - Lower bounds on
transformers
and other requirements to make sure all the functionality we need is there. - Removes
torch
from requirements since users will want to install it themselves depending on their hardware setup, and it's not required anymore when using only the onnx models.
Release 1.1.0
- Added missing
get_threshold
function wtp.split
adapted to some style now also allows changing the threshold viawtp.split(..., threshold=threshold)
. Was previously overwritten by the default.
Release 1.0.1
A major revamp of this library, now called wtpsplit
!
See the Readme for details.