Skip to content

Releases: ELS-RD/transformer-deploy

Add GPT-2 acceleration support

08 Feb 23:07
ccfeb21
Compare
Choose a tag to compare
  • add support for decoder based model (GPT-2) on both ONNX Runtime and TensorRT
  • refactor triton configuration generation (simplification)
  • add GPT-2 model documentation (notebook)
  • fix CPU quantization benchmark (was not using the quant model)
  • fix sentence transformers bug

add CPU support and generic GPU quantization support

28 Dec 22:52
404c5ee
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.2.0...v0.3.0

add GPU quantization support

08 Dec 22:46
ad837a9
Compare
Choose a tag to compare
  • support int-8 GPU quantization
  • add a tuto to perform quantization end to end
  • add QDQRoberta model
  • switch to ONNX opset 13
  • refactoring in the TensorRT engine creation
  • fix bugs
  • add auth token (for private HF repo)

What's Changed

New Contributors

Full Changelog: v0.1.1...v0.2.0

update Triton image to 21.11-py3

24 Nov 08:24
fbfc1e9
Compare
Choose a tag to compare
  • update Docker image
  • update documentation

from PoC to library

23 Nov 13:41
1fe4ab0
Compare
Choose a tag to compare
  • switch from a proof of concept to a library
  • add support for TensorRT Python API (for best performances)
  • improve documentation (separate Hugging Face Infinity thing from the doc, add benchmark, etc.)
  • fix issues with mixed precision
  • add license
  • add tests, Github actions, Makefile
  • change the way the Docker image is built

first release

08 Nov 21:15
Compare
Choose a tag to compare

all the scripts to reproduce https://medium.com/p/e1be0057a51c