Skip to content

v0.5.3

Latest
Compare
Choose a tag to compare
@aleksandr-smechov aleksandr-smechov released this 01 Apr 21:21
· 34 commits to main since this release
419278c

This PR introduces an engine system to swap Whisper engines between faster-whisper and TensorRT-LLM.

API

  • MIT License!
  • Added the ability to swap the Whipser "engine" from the default faster-whisper to TensorRT-LLM, which is much faster. #285
  • Added support for distil models like distil-large-v2 and distil-large-v3. These work with the TensorRT-LLM engine.
  • Added a batch_size parameter for the endpoints. It doesn't do anything yet, but the TensorRT-LLM engine supports batch processing of files, and the idea is add this feature along with dynamic batch.
  • Overall tighter control over dependencies, and various dependency updates.

Diarization

  • Started work implementing Nvidia NeMo's new long-form diarization class. Currently it still consumes too much memory.

Documentation

  • Add examples for offline models for various backends #288 #289

Thanks to contributors @aleksandr-smechov and for the work from the NeMo team, and the WhisperS2T project for the initial code for the TensorRT-LLM backend, and by extension, TensorRT-LLM's Whisper example.