Releases: huggingface/optimum
v1.8.3: Patch release
- Fix Stable Diffusion model ONNX export by @echarlaix in #1020
- Add
optimum-neuron
extra by @michaelbenayoun in #1021
Full Changelog: v1.8.2...v1.8.3
v1.8: extended BetterTransformer support, ONNX merged seq2seq models
Extended BetterTransformer support
Various improvements in the PyTorch BetterTransformer integration.
- [BT] add
BetterTransformer
support for ProphetNet by @hirotasoshu in #923 - Improve bettertransformer benchmark script by @fxmarty in #939
- Fix sdpa with batch size = 1, better benchmark by @fxmarty in #915
- Fix slow tests & sdpa dropout by @fxmarty in #974
- Remove getattr overhead in spda by @fxmarty in #934
- [
BT
] Improve docs by @younesbelkada in #944
ONNX merged seq2seq models
Instead of using two separate decoder_model.onnx
and decoder_with_past_model.onnx
models, a single decoder can be used for encoder-decoder models: decoder_model_merged.onnx
. This allows to avoid duplicated weights in the two without/with past ONNX models.
By default, if available, the decoder_model_merged.onnx
will be used in the ORTModel integration. This can be disabled with the option --no-post-process
in the ONNX export CLI, and with use_merged=False
in the ORTModel.from_pretrained
method.
Example:
optimum-cli export onnx --model t5-small t5_onnx
will give:
└── t5_onnx
├── config.json
├── decoder_model_merged.onnx
├── decoder_model.onnx
├── decoder_with_past_model.onnx
├── encoder_model.onnx
├── generation_config.json
├── special_tokens_map.json
├── spiece.model
├── tokenizer_config.json
└── tokenizer.json
And decoder_model_merged.onnx
is enough to be used for inference. We strongly recommend to inspect the subgraphs with netron to understand what are the inputs/outputs, in case the exported model is to be used with an other engine than ONNX Runtime in the Optimum integration.
- Fix encoder-decoder ONNX merge by @fxmarty in #924
- Support the merge of decoder without/with past for encoder-decoder models in the ONNX export by @fxmarty in #926
- Support merged seq2seq models in ORTModel by @fxmarty in #930
New models in the ONNX export
Major bugfix
- Remove constant output in encoder-decoder ONNX models decoder with past by @fxmarty in #920
- Hash tensor data during deduplication by @VikParuchuri in #932
Potentially breaking changes
The TasksManager replaces legacy tasks names by the canonical ones used on the Hub and in transformers metadata:
sequence-classification
becomestext-classification
,causal-lm
becomestext-generation
,seq2seq-lm
becomestext2text-generation
,speech2seq-lm
andaudio-ctc
becomesautomatic-speech-recognition
,default
becomesfeature-extraction
,masked-lm
becomesfill-mask
,vision2seq-lm
becomesimage-to-text
This should not break anything except if you rely on private methods and attributes from TasksManager
.
What's Changed
- Update ort trainer to transformers 4.27.2 by @JingyaHuang in #917
- Compute Loss inside the training step. by @AdamLouly in #686
- Fix ORTModel MRO for whisper by @fxmarty in #919
- add ORTStableDiffusionPipeline reference in documentation by @echarlaix in #890
- Fix decoder ONNX model loading from the Hub by @fxmarty in #929
optimun-cli onnxruntime quantize / optimize
output argument is now required by @michaelbenayoun in #927- Register mechanism for the Optimum CLI by @michaelbenayoun in #928
- Ensure backward compatibility of ORTModel by @fxmarty in #933
- Update the README by @michaelbenayoun in #925
- Update README by @echarlaix in #941
- Update readme by @echarlaix in #942
- Remove GC from README by @michaelbenayoun in #943
- Add user and token for CI by @michaelbenayoun in #945
- Update README by @echarlaix in #946
optimum-cli
print the help of subcommands by @michaelbenayoun in #940- Remove from_transformers references from the documentation by @fxmarty in #935
- Turn command import into optional by @JingyaHuang in #936
- Auto-set use_merged to False if use_cache is passed as False by @fxmarty in #954
- Raise error with use_cache=False, use_io_binding=True by @fxmarty in #955
- Add an ORT training notebook by @JingyaHuang in #959
- Fix issue with doc build sometimes failing silently in GH workflows by @regisss in #960
- Fix typos by @regisss in #963
- Disable tests upon transformers 4.28 release by @fxmarty in #976
New Contributors
- @hirotasoshu made their first contribution in #923
- @VikParuchuri made their first contribution in #932
Full Changelog: v1.7.3...v1.8.2
v1.7.3: Patch release for PyTorch 2.0 and transformers 4.27.0
This patch releases fixes a few bugs with PyTorch 2.0 release, and include a few new features as well.
Breaking change: constant outputs removed from ONNX encoder-decoder models
We removed some constant past key values outputs from encoder-decoder models in the ONNX export. Beware that this could potentially break your existing code, but we recommend to use the new exported models as this removes unnecessary Identity
nodes in the models.
- Remove constant outputs from decoder with past ONNX model for encoder-decoder architectures by @fxmarty in #872
torch.nn.functional.scaled_dot_product_attention
support for decoders in BetterTransformer
Pytorch 2.0 introduces in beta torch.nn.functional.scaled_dot_product_attention
, a fastpath for attention extending their accelerated transformer features. This is included in optimum.bettertransformer
to be used with the following architectures: Bart, Blenderbot, GPT2, GTP-J, M2M100, Marian, Mbart, OPT, Pegasus, T5.
Beware that this is still experimental and speedups have yet to be validated on all architectures.
PyTorch's scaled_dot_product_attention
allows to use flash attention and memory efficient attention natively in PyTorch.
Usage is as follow:
from transformers import AutoTokenizer, AutoModelForCausalLM
from optimum.bettertransformer import BetterTransformer
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")
model = BetterTransformer.transform(model) # modify transformers modeling to use native scaled_dot_product_attention
# do you inference or training here
model = BetterTransformer.reverse(model) # go back to using canonical transformers modeling
model.save_pretrained("gpt2_model")
Inference benchmark (on fp16):
Model | batch size | Input sequence length | Generated tokens | Latency eager (s) | Latency BT (s) | Speedup | Peak memory eager (MB) | Peak memory BT (MB) | Memory savings |
---|---|---|---|---|---|---|---|---|---|
gpt2 | 1 | 64 | 256 | 1.800 | 1.607 | 12.0% | 569.90 | 569.89 | 0% |
gpt2 | 64 | 64 | 256 | 2.159 | 1.617 | 33.5% | 2067.45 | 2093.80 | 0% |
opt-1.3b | 1 | 64 | 256 | 3.010 | 2.667 | 12.9% | 5408.238 | 5408.238 | 0% |
gpt-neox-20b | 1 | 64 | 256 | 10.869 | 9.937 | 9.4% | 83670.67 | 83673.53 | 0% |
Training benchmark (on fp16):
Model | batch size | Sequence length | time/epoch (eager, s) | time/epoch (BT, s) | Speedup | Peak memory eager (MB) | Peak memory BT (MB) | Memory savings |
---|---|---|---|---|---|---|---|---|
gpt2 | 8 | 1024 | 17.732 | 14.037 | 26.3% | 13291.16 | 10191.52 | 30.4% |
gpt2 | 32 | 1024 | 17.336 | 13.309 | 30.3% | 52834.83 | 38858.56 | 36.0% |
gpt2 | 64 | 1024 | OOM | 14.067 | / | OOM | 75600.08 | / |
Benchmarks can be reproduced using the inference script and training script:
python benchmark_bettertransformer.py --model-name gpt2 --use-half --use-cuda --is_decoder --num-batches 5 --max_token 256
python benchmark_bettertransformer.py --model-name gpt2 --use-half --use-cuda --is_decoder --num-batches 5 --max_token 256 --seqlen-stdev 0
- Add scaled_dot_product_attention support for decoder models by @fxmarty in #853
- Support scaled_dot_product_attention for t5 by @fxmarty in #856
- [
BT
] add decoder benchmark script by @younesbelkada in #857 - [
BT
] Fix bt benchmark by @younesbelkada in #858 - Fix pytorch version check in bettertransformer by @fxmarty in #862
- [
BT
] Add fp16 support by @younesbelkada in #859 - [
BT
] Add decoder training support by @younesbelkada in #860 - Bart support scaled_dot_product_attention by @fxmarty in #863
- [
BT
] addaccelerate_test
markers by @younesbelkada in #864 - Mbart, pegasus, blenderbot, marian, m2m_100 support scaled_dot_product_attention by @fxmarty in #865
- Add bettertransformer reverse transform by @fxmarty in #868
- Add bettertransformer training benchmark script by @fxmarty in #873
New architectures in the ONNX export
Three additional architectures are supported in the ONNX export: ImageGPT, RegNet, OPT.
- Adding ONNX support for ImageGPT by @adit299 in #819
- Add ONNX support for RegNet by @asrimanth in #833
- Adding support for Facebook's OPT models by @hivaze in #852
(WIP) TFLite export with quantization support
Continued progress in the TFLite export with quantization support. This is work in progress and not documented yet.
- Quantization with TFLite by @michaelbenayoun in #854
Bugfixes and improvements
- Update documentation by @echarlaix in #843
- Fix typo in documentation by @regisss in #848
- Remove redundant code by @mht-sharma in #841
- Update README by @echarlaix in #850
- Update documentation by @echarlaix in #855
- Remove iobinding ORTModelForCTC by @mht-sharma in #840
- Fix typo in documentation by @echarlaix in #861
- Fix causal-lm ONNX axis names by @fxmarty in #871
- add NNCF openvino notebook by @echarlaix in #875
- Remove positional-only parameters not support by python < v3.8 by @echarlaix in #881
- lazy import for task manager by @JingyaHuang in #844
- Remove onnx and ort dependencies on the TasksManager by @michaelbenayoun in #846
- Reactivate export & optimization tests for causal-lm models by @fxmarty in #885
- Fix ONNX export on transformers 4.27 release by @fxmarty in #884
- Do not use scaled_dot_product_attention for stable diffusion onnx export by @fxmarty in #888
- Fix loading of an ONNX stable diffusion model when config doesn't match by @echarlaix in #887
- Automatic framework detection in TasksManager for large models by @fxmarty in #883
- Fix WavLM onnx export upon torch 2.0 release by @fxmarty in #889
- Fix PushToHubMixin._create_repo according to transformers 4.27 release by @fxmarty in #892
- Fix stable diffusion framework detection by @fxmarty in #893
- Add donut CPU inference ORT by @mht-sharma in #761
- Fix check_model for large merged ONNX models by @fxmarty in #896
- Drop python 3.7 support by @fxmarty in #891
- Fix dummy label generator for vision tasks by @JingyaHuang in #900
- Add stable diffusion dummy object by @echarlaix in #899
- Automatic support for large ONNX models in ORTOptimizer by @fxmarty in #886
- Remove subprocess calls in ONNX export by @fxmarty in #897
- Registering mechanism for the
TasksManager
by @michaelbenayoun in https://github.com/huggingface/optimum/pull...
v1.7.1: Patch release
Temporarily fix a critical bug in BetterTransformer #849
Full Changelog: v1.7.0...v1.7.1
v1.7.0: ONNX export extension, TFLite export, single-ONNX decoding, ONNX Runtime extension for audio, vision tasks, stable diffusion
New models supported in the ONNX export
Additional architectures are supported in the ONNX export: PoolFormer, Pegasus, Audio Spectrogram Transformer, Hubert, SEW, Speech2Text, UniSpeech, UniSpeech-SAT, Wav2Vec2, Wav2Vec2-Conformer, WavLM, Data2Vec Audio, MPNet, stable diffusion VAE encoder, vision encoder decoder, Nystromformer, Splinter, GPT NeoX.
- Add PoolFormer support in exporters.onnx by @BakingBrains in #646
- Support pegasus exporters by @mht-sharma in #620
- Audio models support with
optimum.exporters.onnx
by @michaelbenayoun in #622 - Add MPNet ONNX export by @jplu in #691
- Add stable diffusion VAE encoder export by @echarlaix in #705
- Add vision encoder decoder model in exporters by @mht-sharma in #588
- Nystromformer ONNX export by @whr778 in #728
- Support Splinter exporters (#555) by @Allanbeddouk in #736
- Add gpt-neo-x support by @sidthekidder in #745
New models supported in BetterTransformer
A few additional architectures are supported in BetterTransformer: RoCBERT, RoFormer, Marian
- Add RoCBert support for Bettertransformer by @shogohida in #542
- Add better transformer support for RoFormer by @manish-p-gupta in #680
- added BetterTransformer support for Marian by @IlyasMoutawwakil in #808
Additional tasks supported in the ONNX Runtime integration
With ORTModelForMaskedLM, ORTModelForVision2Seq, ORTModelForAudioClassification, ORTModelForCTC, ORTModelForAudioXVector, ORTModelForAudioFrameClassification, ORTStableDiffusionPipeline.
Reference: https://huggingface.co/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort and https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/models#export-and-inference-of-stable-diffusion-models
- Add ORTModelForMaskedLM class by @JingyaHuang in #729
- Add ORTModelForVision2Seq for VisionEncoderDecoder models inference by @mht-sharma in #742
- Add ORTModelXXX for audio by @mht-sharma in #774
- Add stable diffusion onnx runtime pipeline by @echarlaix in #786
Support of the ONNX export from PyTorch on float16
In the ONNX export, it is possible to pass the options --fp16 --device cuda
to export using float16 when a GPU is available, directly with the native torch.onnx.export
.
Example: optimum-cli export onnx --model gpt2 --fp16 --device cuda gpt2_onnx/
TFLite export
TFLite export is now supported, with static shapes:
optimum-cli export tflite --help
optimum-cli export tflite --model bert-base-uncased --sequence_length 128 bert_tflite/
exporters.tflite
initial support by @michaelbenayoun in #716- TFLite auto-encoder models by @michaelbenayoun in #757
- [TFLite Export] Adds support for ResNet by @sayakpaul in #813
ONNX Runtime optimization and quantization directly in the CLI
- Add optimize and quantize command CLI by @jplu in #700
- Support ONNX Runtime optimizations in exporters.onnx by @fxmarty in #807
The ONNX export optionally supports the ONNX Runtime optimizations directly in the export, passing the --optimize O1
, up to --optimize O4
option:
optimum-cli export onnx --help
optimum-cli export onnx --model t5-small --optimize O3 t5small_onnx/
ONNX Runtime quantization is supported directly in command line, using optimum-cli onnxruntime quantize
:
optimum-cli onnxruntime quantize --help
optimum-cli onnxruntime quantize --onnx_model distilbert_onnx --avx512
ONNX Runtime optimization is supported directly in command line, using optimum-cli onnxruntime optimize
:
optimum-cli onnxruntime optimize --help
optimum-cli onnxruntime optimize --onnx_model distilbert_onnx -O3
ORTModelForCausalLM supports decoding with a single ONNX
Up no now, for decoders, two ONNX were used:
- One handling the first forward pass where no past key values have been cached yet - thus not taking them as input.
- One handling the following forward pass where past key values have been cached, thus taking them as input.
This release introduces the support in the ONNX export and in ORTModelForCausalLM
of a single ONNX handling both steps of the decoding. This allows to reduce memory usage, as weights are not duplicated between two separate models during inference.
Using a single ONNX for decoders can be used by passing use_merged=True
to ORTModelForCausalLM.from_pretrained
, loading directly from a PyTorch model:
from optimum.onnxruntime import ORTModelForCausalLM
model = ORTModelForCausalLM.from_pretrained("gpt2", export=True, use_merged=True)
Alternatively, using a single ONNX for decoders is the default behavior in the ONNX export, that can later be used for example with ORTModelForCausalLM
, the command optimum-cli export onnx --model gpt2 gpt2_onnx/
will produce:
└── gpt2_onnx
├── config.json
├── decoder_model_merged.onnx
├── decoder_model.onnx
├── decoder_with_past_model.onnx
├── merges.txt
├── special_tokens_map.json
├── tokenizer_config.json
├── tokenizer.json
└── vocab.json
The decoder_model.onnx
and decoder_with_past_model.onnx
are kept separate for backward compatibility, but during inference using solely decoder_model_merged.onnx
is enough.
- Enable inference with a merged decoder in
ORTModelForCausalLM
by @JingyaHuang in #647
Single-file ORTModel accept numpy arrays
ORTModel accept numpy arrays as inputs, in addition to PyTorch tensors. This is only the case for models that use a single ONNX.
ORTOptimizer support for ORTModelForCausalLM
- ORTOptimizer support ORTModelForCausalLM by @fxmarty in #794
- Support IO Binding for merged decoder by @fxmarty in #797
Breaking changes
- In the ONNX export, exporting models in several ONNX (encoder, decoder) is now the default behavior: #747. The old behavior is still accessible with
--monolith
. - In decoders, reusing past key values is now the default in the ONNX export: #748. The old behavior is still accessible by explicitly passing, for example,
--task causal-lm
instead of--task causal-lm-with-past
. - BigBird support in the ONNX export is removed, due to the
block_sparse
attention type being written in pure numpy in Transformers, and hence not exportable to ONNX: #778 - The parameter
from_transformers
ofORTModel.from_pretrained
will be deprecated in favor ofexport
.
Bugfixes and improvements
- Fix disable shape inference for optimization by @regisss in #652
- Fix uninformative message when passing
use_cache=True
to ORTModel and no ONNX with cache is available by @fxmarty in #650 - Fix provider options when several providers are passed by @fxmarty in #653
- Add TensorRT engine to ONNX Runtime GPU documentation by @fxmarty in #657
- Improve documentation around ONNX export by @fxmarty in #666
- minor updates on ONNX config guide by @mszsorondo in #662
- Fix FlaubertOnnxConfig by @michaelbenayoun in #669
- Use nvcr.io/nvidia/tensorrt image for GPU tests by @fxmarty in #660
- Better Transformer doc fix by @HamidShojanazeri in #670
- Add support for LongT5 optimization using ORT transformer optimizer script by @kunal-vaishnavi in #683
- Add test for missing execution providers error messages by @fxmarty in #659
- ONNX transformation to cast int64 constants to int32 when possible by @fxmarty in #655
- Add missing normalized configs by @fxmarty in #694
- Remove code duplication in ORTModel's load_model by @fxmarty in #695
- Test more architectures in ORTModel by @fxmarty in #675
- Avoid initializing unwanted attributes for ORTModel's having several inference sessions by @fxmarty in #696
- Fix the ORTQuantizer loading from specific file by @echarlaix in #701
- Add saving of diffusion model additional components ...
v1.6.4: Patch release
Bugfix
- Fix past key/value reuse in decoders following transformers 4.26.0 release and renaming: b9211d6
- ONNX Runtime 1.14 support: #772
Full Changelog: v1.6.3...v1.6.4
v1.6.3: Patch release
Fixes ORTTrainer
for the inference with the ONNX Runtime backend.
v1.6.2: Patch release
Hotfixes
Regressions
The export of speech-to-text architecture as a single ONNX file (that handles both the encoding and decoding) fails do to a regression with the latest transformers version: #721
Full Changelog: v1.6.1...v1.6.2
v1.6.1: Patch release
Hotfixes
- Revert breaking removal of EncoderOnnxConfig, DecoderOnnxConfig, _DecoderWithLMhead by @fxmarty in #643
- Fix item access of some _TASKS_TO_AUTOMODELS by @fxmarty in #642
Full Changelog: v1.6.0...v1.6.1
v1.6.0: Optimum CLI, Stable Diffusion ONNX export, BetterTransformer & ONNX support for more architectures
Optimum CLI
The Optimum command line interface is introduced, and is now the official entrypoint for the ONNX export. Example commands:
optimum-cli --help
optimum-cli export onnx --help
optimum-cli export onnx --model bert-base-uncased --task sequence-classification bert_onnx/
Stable Diffusion ONNX export
Optimum now supports the ONNX export of stable diffusion models from the diffusers library:
optimum-cli export onnx --model runwayml/stable-diffusion-v1-5 sd_v15_onnx/
- Add Stable Diffusion ONNX export by @echarlaix in #570
BetterTransformer support for more architectures
BetterTransformer integration includes new models in this release: CLIP, RemBERT, mBART, ViLT, FSMT
The complete list of supported models is available in the documentation.
- [BT] Add
Bettertransformer
support for FSMT by @Sumanth077 in #494 - [BT] add
BetterTransformer
support for ViLT architecture by @ka00ri in #508 - Add
MBart
support forBetterTransformer
by @ravenouse in #516 - Add CLIP BetterTransformer by @fxmarty in #534
- Add BetterTransformer support for RemBERT by @hchings in #545
ONNX export for more architectures
The ONNX export now supports Swin, MobileNet-v1, MobileNet-v2.
- Add Swin support in exporters.onnx by @fxmarty in #528
- [
ONNX
] addmobilenet
support by @younesbelkada in #633
Extended ONNX export for encoder-decoder and decoder models
Encoder-decoder or decoder-only models normally making use of the generate()
method in transformers can now be exported in several files using the --for-ort
argument:
optimum-cli export onnx --model t5-small --task seq2seq-lm-with-past --for-ort t5_small_onnx
yielding:
.
└── t5_small_onnx
├── config.json
├── decoder_model.onnx
├── decoder_with_past_model.onnx
├── encoder_model.onnx
├── special_tokens_map.json
├── spiece.model
├── tokenizer_config.json
└── tokenizer.json
Passing --for-ort
, exported models are expected to be loadable directly into ORTModel.
- Add ort export in exporters for encoder-decoder models by @mht-sharma in #497
- Support decoder generated with
--for-ort
fromoptimum.exporters.onnx
inORTDecoder
by @fxmarty in #554
Support for ONNX models with external data at export, optimization, quantization
The ONNX export from PyTorch normally creates external data in case the exported model is larger than 2 GB. This release introduces a better support for the export and use of large models, writting all external data into a .onnx_data
file if necessary.
- Handling ONNX models with external data by @NouamaneTazi in #586
- Improve the compatibility dealing with large ONNX proto in ORTOptimizer and ORTQuantizer by @JingyaHuang in #332
ONNX Runtime API improvement
Various improvements to allow for a better user experience in the ONNX Runtime integration:
-
ORTModel
,ORTModelDecoder
andORTModelForConditionalGeneration
can now load any ONNX model files regardless of their names, allowing to load optimized and quantized models without having to specify a file name argument. -
ORTModel.from_pretrained()
withfrom_transformers=True
now downloads and loads the model in a temporary directory instead of the cache, which was not a right place to store it. -
ORTQuantizer.save_pretrained()
now saves the model configuration and the preprocessor, making the exported directory usable end-to-end. -
ORTOptimizer.save_pretrained()
now saves the preprocessor, making the exported directory usable end-to-end. -
ONNX Runtime integration API improvement by @michaelbenayoun in #515
Custom shapes support at ONNX export
The shape of the example input to provide for the export to ONNX can be overridden in case the validity of the ONNX model is sensitive to the shape used during the export.
Read more: optimum-cli export onnx --help
- Support custom shapes for dummy inputs by @fxmarty in #522
- Support for custom input shapes in exporters onnx by @fxmarty in #575
Enable use_cache=True
for ORTModelForCausalLM
Reusing past key values for models using ORTModelForCausalLM (e.g. gpt2) is now possible using use_cache=True
, avoiding to recompute them at each iteration of the decoding:
from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = ORTModelForCausalLM.from_pretrained("gpt2", from_transformers=True, use_cache=True)
inputs = tokenizer("My name is Arthur and I live in", return_tensors="pt")
gen_tokens = model.generate(**inputs)
tokenizer.batch_decode(gen_tokens)
- Enable past_key_values for ORTModelForCausalLM by @echarlaix in #326
IO binding support for ORTModelForCustomTasks
ORTModelForCustomTasks now supports IO Binding when using CUDAExecutionProvider.
- Add IO binding support for custom ORTModel by @JingyaHuang in #447
Experimental support to merge ONNX decoder with/without past key values
Along with --for-ort
, when passing --task causal-lm-with-past
, --task seq2seq-with-past
or --task speech2seq-lm-with-past
during the ONNX export exports two models: one not using the previously computed keys/values, and one using them.
An experimental support is introduced to merge the two models in one. Example:
optimum-cli export onnx --model t5-small --task seq2seq-lm-with-past --for-ort t5_onnx/
import onnx
from optimum.onnx import merge_decoders
decoder = onnx.load("t5_onnx/decoder_model.onnx")
decoder_with_past = onnx.load("t5_onnx/decoder_with_past_model.onnx")
merged_model = merge_decoders(decoder, decoder_with_past)
onnx.save(merged_model, "t5_onnx/decoder_merged_model.onnx")
- Merge ONNX decoder models by @JingyaHuang in #587
Major bugs fixed
- Fix BetterTransformer with padding="max_length" by @fxmarty in #543
- Fix non-nesting bug in BetterTransformer integration by @younesbelkada in #637
Other changes, bugfixes and improvements
- Fix doc-builder premission error by @mishig25 in #482
- Fix doc build pr premissions by @mishig25 in #484
- Re-order the task manager doc by @michaelbenayoun in #483
- Fix whisper device for gpu test by @fxmarty in #486
- Fix tensorflow CI by @fxmarty in #489
- Fix PR doc generation by @regisss in #495
- Fix broken links in the doc by @fxmarty in #499
- Update iobinding ORT encoder whisper by @mht-sharma in #498
- fix NormalizedConfig init error message by @PaulQbFeng in #500
- Change import structure for ORTModel by @fxmarty in #456
- [BT] Fix failing CI tests by @younesbelkada in #501
- Remove redundant condition statement in ORTDecoder(Seq2seq) by @JingyaHuang in #504
- [BT] put decorator on the correct place by @younesbelkada in #509
- [BT] clearer error message for
norm_first
by @younesbelkada in #510 - Deprecate PyTorch 1.12. for BetterTransformer by @fxmarty in #513
- Fix ORTModelForSeq2SeqLM test by @fxmarty in #455
- Clearer error messages when initilizing the requested ONNX Runtime execution provider fails by @fxmarty in #514
- [BT] Fix doc bugs by @younesbelkada in #517
- Replace sklearn by scikit-learn by @lesteve in #502
- ORTModel uses optimum.exporters.onnx by @michaelbenayoun in #490
- Cleanup deprecated ONNX Runtime training docker files by @JingyaHuang in #523
- Added support for Tapas Model by @juheon...