Skip to content

Add GPT-2 acceleration support

Latest
Compare
Choose a tag to compare
@pommedeterresautee pommedeterresautee released this 08 Feb 23:07
· 136 commits to main since this release
ccfeb21
  • add support for decoder based model (GPT-2) on both ONNX Runtime and TensorRT
  • refactor triton configuration generation (simplification)
  • add GPT-2 model documentation (notebook)
  • fix CPU quantization benchmark (was not using the quant model)
  • fix sentence transformers bug