Releases: huggingface/transformers
v4.1.1: TAPAS, MPNet, model parallelization, Sharded DDP, conda, multi-part downloads.
v4.1.1: TAPAS, MPNet, model parallelization, Sharded DDP, conda, multi-part downloads.
TAPAS (@NielsRogge)
Four new models are released as part of the TAPAS implementation: TapasModel
, TapasForQuestionAnswering
, TapasForMaskedLM
and TapasForSequenceClassification
, in PyTorch.
TAPAS is a question answering model, used to answer queries given a table. It is a multi-modal model, joining text for the query and tabular data.
The TAPAS model was proposed in TAPAS: Weakly Supervised Table Parsing via Pre-training by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos.
- Tapas v4 (tres) #9117 (@NielsRogge)
- AutoModelForTableQuestionAnswering #9154 (@LysandreJik)
- TableQuestionAnsweringPipeline #9145 (@LysandreJik)
MPNet (@StillKeepTry)
Six new models are released as part of the MPNet implementation: MPNetModel
, MPNetForMaskedLM
, MPNetForSequenceClassification
, MPNetForMultipleChoice
, MPNetForTokenClassification
, MPNetForQuestionAnswering
, in both PyTorch and TensorFlow.
MPNet introduces a novel self-supervised objective named masked and permuted language modeling for language understanding. It inherits the advantages of both the masked language modeling (MLM) and the permuted language modeling (PLM) to addresses the limitations of MLM/PLM, and further reduce the inconsistency between the pre-training and fine-tuning paradigms.
The MPNet model was proposed in MPNet: Masked and Permuted Pre-training for Language Understanding by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
- MPNet: Masked and Permuted Pre-training for Language Understanding #8971 (@StillKeepTry)
Model parallel (@alexorona)
Model parallelism is introduced, allowing users to load very large models on two or more GPUs by spreading the model layers over them. This can allow GPU training even for very large models.
- gpt2 and t5 parallel modeling #8696 (@alexorona)
- Model parallel documentation #8741 (@LysandreJik)
- Patch model parallel test #8825, #8920 (@LysandreJik)
Conda release (@LysandreJik)
Transformers welcome their first conda releases, with v4.0.0, v4.0.1 and v4.1.0. The conda packages are now officially maintained on the huggingface
channel.
- Put Transformers on Conda #8918 (@LysandreJik)
Multi-part uploads (@julien-c)
For the first time, very large models can be uploaded to the model hub, by using multi-part uploads.
New examples and reorganization (@sgugger)
We introduced a refactored SQuAD example & notebook, which is faster and simpler than the previous scripts.
The example directory has been re-ordered as we introduce the separation between "examples", which are maintained examples showcasing how to do one specific task, and "research projects", which are bigger projects and maintained by the community.
Introduction of fairscale with Sharded DDP (@sgugger)
We introduce support for fariscale's ShardedDDP in the Trainer
, allowing reduced memory usage when training models in a distributed fashion.
- Experimental support for fairscale ShardedDDP #9139 (@sgugger)
- Fix gradient clipping for Sharded DDP #9168 (@sgugger)
Barthez (@moussaKam)
The BARThez model is a French variant of the BART model. We welcome its specific tokenizer to the library and multiple checkpoints to the modelhub.
- Add barthez model #8393 (@moussaKam)
General improvements and bugfixes
disable_ngram_loss
fix for prophetnet #8554 (@Zhylkaaa)- Fix run_ner script #8664 (@sgugger)
- [tokenizers] convert_to_tensors: don't reconvert when the type is already right #8283 (@stas00)
- [examples/seq2seq] fix PL deprecation warning #8577 (@stas00)
- Add sentencepiece to the CI and fix tests #8672 (@sgugger)
- Alternative to globals() #8667 (@sgugger)
- Update the bibtex with EMNLP demo #8678 (@JetRunner)
- Document adam betas TrainingArguments #8688 (@sgugger)
- Fix rag finetuning + add finetuning test #8585 (@lhoestq)
- moved temperature warper before topP/topK warpers #8686 (@theorm)
- Vectorize RepetitionPenaltyLogitsProcessor to improve performance #8598 (@bdalal)
- [Generate Test] fix flaky ci #8694 (@patrickvonplaten)
- Fix bug in x-attentions output for roberta and harden test to catch it #8660 (@ysgit)
- Add pip install update to resolve import error in transformers notebook #8616 (@jessicayung)
- Improve bert-japanese tokenizer handling #8659 (@julien-c)
- Change default cache path #8734 (@sgugger)
- [trainer] make generate work with multigpu #8716 (@stas00)
- consistent ignore keys + make private #8737 (@stas00)
- Fix max length in run_plm script #8738 (@sgugger)
- Add early stopping callback to pytorch trainer #8581 (@cbrochtrup)
- Support various BERT relative position embeddings (2nd) #8276 (@zhiheng-huang)
- Fix slow tests v2 #8746 (@LysandreJik)
- MT5 should have an autotokenizer #8743 (@LysandreJik)
- added instructions for syncing upstream master with forked master via PR #8745 (@bdalal)
- fix rag index names in eval_rag.py example #8730 (@lhoestq)
- [core] implement support for run-time dependency version checking #8645 (@stas00)
- New TF model inputs #8602 (@jplu)
- Big model table #8774 (@sgugger)
- Attempt to get a better fix for QA #8768 (@Narsil)
- Fix QA argument handler #8765 (@LysandreJik)
- Return correct Bart hidden state tensors #8747 (@joeddav)
- [XLNet] Fix mems behavior #8567 (@patrickvonplaten)
- [s2s] finetune.py: specifying generation min_length #8478 (@danyaljj)
- Revert "[s2s] finetune.py: specifying generation min_length" #8805 (@patrickvonplaten)
- Fix PPLM #8779 (@chutaklee)
- [s2s finetune trainer] potpurri of small fixes #8807 (@stas00)
- [FlaxBert] Fix non-broadcastable attention mask for batched forward-passes #8791 (@KristianHolsheimer)
- [Flax test] Add require pytorch to flix flax test #8816 (@patrickvonplaten)
- Fix dpr<>bart config for RAG #8808 (@patrickvonplaten)
- Extend typing to path-like objects in
PretrainedConfig
andPreTrainedModel
#8770 (@gcompagnoni) - Fix setup.py on Windows #8798 (@jplu)
- BART & FSMT: fix decoder not returning hidden states from the last layer #8597 (@maksym-del)
- suggest a numerical limit of 50MB for determining @slow #8824 (@stas00)
- [MT5] Add use_cache to config #8832 (@patrickvonplaten)
- [Pegasus] Refactor Tokenizer #8731 (@patrickvonplaten)
- [CI] implement job skipping for doc-only PRs #8826 (@stas00)
- Migration guide from v3.x to v4.x #8763 (@LysandreJik)
- Add T5 Encoder for Feature Extraction #8717 (@agemagician)
- token-classification: use is_world_process_zero instead of is_world_master() #8828 (@stefan-it)
- Correct docstring. #8845 (@Fraser-Greenlee)
- Add a direct link to the big table #8850 (@sgugger)
- Use model.from_pretrained for DataParallel also #8795 (@shaie)
- Remove deprecated
evalutate_during_training
#8852 (@sgugger) - Attempt to fix Flax CI error(s) #8829 (@mfuntowicz)
- NerPipeline (TokenClassification) now outputs offsets of words #8781 (@Narsil)
- [s2s trainer] fix DP mode #8823 (@stas00)
- Ctrl for sequence classification #8812 (@elk-cloner)
- Fix docstring for language code in mBart #8848 (@RQuispeC)
- 2 typos in modeling_rag.py #8676 (@ratthachat)
- Make the big table creation/check platform independent #8856 (@sgugger)
- Prevent BatchEncoding from blindly passing casts down to the tensors it contains #8860 (@Craigacp)
- Better warning when loading a tokenizer with AutoTokenizer w/o Sneten… #8881 (@LysandreJik)
- [CI] skip docs-only jobs take #2 #8853 (@stas00)
- Better support for resuming training #8878 (@sgugger)
- Add a
parallel_mode
property to TrainingArguments #8877 (@sgugger) - [trainer] start using training_args.parallel_mode #8882 (@stas00)
- [ci] skip doc jobs take #3 #8885 (@stas00)
- Transfoxl seq classification #8868 (@spatil6)
- Warning about too long input for fast tokenizers too #8799 (@Narsil)
- [trainer] improve code readability #8903 (@stas00)
- [PyTorch] Refactor Resize Token Embeddings #8880 (@patrickvonplaten)
- Don't warn that models aren't available if Flax is available. #8841 (@skye)
- Avoid erasing the attention mask when double padding #8915 (@sgugger)
- Fix move when the two cache folders exist #8917 (@sgugger)
- Tweak wording + Add badge w/ number of models on the hub #8914 (@julien-c)
- [s2s finetune_trainer] add instructions for distributed training #8884 (@stas00)
- Better booleans handling in the TF models #8777 (@jplu)
- Fix TF T5 only encoder model with booleans #8925 (@LysandreJik)
- [ci] skip doc jobs - circleCI is not reliable - disable skip for now #8926 (@stas00)
- [seq2seq] document the caveat of leaky native amp #8930 (@stas00)
- Don't pass in token_type_ids to BART for GLUE #8929 (@ethanjperez)
- Fix typo for
modeling_bert
import resulting in ImportError #8931 (@machelreid) - Fix QA pipeline on Windows #8947 (@sgugger)
- Add TFGPT2ForSequenceClassification based on DialogRPT #8714 (@spatil6)
- Remove sourcerer #8965 (@clmnt)
- Use word_ids to get labels in run_ner #8962 (@sgugger)
- Small fix to the run clm script #8973 (@sgugger)
- Update quicktour docs to showcase the use of truncation #8975 (@navjotts)
- Copyright #8970 (@sgugger)
- Check table as independent script #8976 (@LysandreJik)
- [training] SAVE_STATE_WARNING was removed in pytorch #8979 (@stas00)
- Optional layers #8961 (@jplu)
- Make
ModelOutput
pickle-able #8989 (@sgugger) - Fix interaction of return_token_type_ids and add_special_tokens #8854 (@LysandreJik)
- Removed unused
encoder_hidden_states
andencoder_attention_mask
#8972 (@guillaume-be) - Checking output format + check raises ValueError #8986 (@na...
Patch release: better error message & invalid trainer attribute
Transformers v4.0.0: Fast tokenizers, model outputs, file reorganization
Transformers v4.0.0-rc-1: Fast tokenizers, model outputs, file reorganization
Breaking changes since v3.x
Version v4.0.0 introduces several breaking changes that were necessary.
1. AutoTokenizers and pipelines now use fast (rust) tokenizers by default.
The python and rust tokenizers have roughly the same API, but the rust tokenizers have a more complete feature set. The main breaking change is the handling of overflowing tokens between the python and rust tokenizers.
How to obtain the same behavior as v3.x in v4.x
- The pipelines now contain additional features out of the box. See the token-classification pipeline with the
grouped_entities
flag. - The auto-tokenizers now return rust tokenizers. In order to obtain the python tokenizers instead, the user may use the
use_fast
flag by setting it toFalse
:
In version v3.x
:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("xxx")
to obtain the same in version v4.x
:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("xxx", use_fast=False)
2. SentencePiece is removed from the required dependencies
The requirement on the SentencePiece dependency has been lifted from the setup.py
. This is done so that we may have a channel on anaconda cloud without relying on conda-forge
. This means that the tokenizers that depend on the SentencePiece library will not be available with a standard transformers
installation.
This includes the slow versions of:
XLNetTokenizer
AlbertTokenizer
CamembertTokenizer
MBartTokenizer
PegasusTokenizer
T5Tokenizer
ReformerTokenizer
XLMRobertaTokenizer
How to obtain the same behavior as v3.x in v4.x
In order to obtain the same behavior as version v3.x
, you should install sentencepiece
additionally:
In version v3.x
:
pip install transformers
to obtain the same in version v4.x
:
pip install transformers[sentencepiece]
or
pip install transformers sentencepiece
3. The architecture of the repo has been updated so that each model resides in its folder
The past and foreseeable addition of new models means that the number of files in the directory src/transformers
keeps growing and becomes harder to navigate and understand. We made the choice to put each model and the files accompanying it in their own sub-directories.
This is a breaking change as importing intermediary layers using a model's module directly needs to be done via a different path.
How to obtain the same behavior as v3.x in v4.x
In order to obtain the same behavior as version v3.x
, you should update the path used to access the layers.
In version v3.x
:
from transformers.modeling_bert import BertLayer
to obtain the same in version v4.x
:
from transformers.models.bert.modeling_bert import BertLayer
4. Switching the return_dict
argument to True
by default
The return_dict
argument enables the return of named-tuples-like python objects containing the model outputs, instead of the standard tuples. This object is self-documented as keys can be used to retrieve values, while also behaving as a tuple as users may retrieve objects by index or by slice.
This is a breaking change as the limitation of that tuple is that it cannot be unpacked: value0, value1 = outputs
will not work.
How to obtain the same behavior as v3.x in v4.x
In order to obtain the same behavior as version v3.x
, you should specify the return_dict
argument to False
, either in the model configuration or during the forward pass.
In version v3.x
:
outputs = model(**inputs)
to obtain the same in version v4.x
:
outputs = model(**inputs, return_dict=False)
5. Removed some deprecated attributes
Attributes that were deprecated have been removed if they had been deprecated for at least a month. The full list of deprecated attributes can be found in #8604.
Here is a list of these attributes/methods/arguments and what their replacements should be:
In several models, the labels become consistent with the other models:
masked_lm_labels
becomeslabels
inAlbertForMaskedLM
andAlbertForPreTraining
.masked_lm_labels
becomeslabels
inBertForMaskedLM
andBertForPreTraining
.masked_lm_labels
becomeslabels
inDistilBertForMaskedLM
.masked_lm_labels
becomeslabels
inElectraForMaskedLM
.masked_lm_labels
becomeslabels
inLongformerForMaskedLM
.masked_lm_labels
becomeslabels
inMobileBertForMaskedLM
.masked_lm_labels
becomeslabels
inRobertaForMaskedLM
.lm_labels
becomeslabels
inBartForConditionalGeneration
.lm_labels
becomeslabels
inGPT2DoubleHeadsModel
.lm_labels
becomeslabels
inOpenAIGPTDoubleHeadsModel
.lm_labels
becomeslabels
inT5ForConditionalGeneration
.
In several models, the caching mechanism becomes consistent with the other models:
decoder_cached_states
becomespast_key_values
in all BART-like, FSMT and T5 models.decoder_past_key_values
becomespast_key_values
in all BART-like, FSMT and T5 models.past
becomespast_key_values
in all CTRL models.past
becomespast_key_values
in all GPT-2 models.
Regarding the tokenizer classes:
- The tokenizer attribute
max_len
becomesmodel_max_length
. - The tokenizer attribute
return_lengths
becomesreturn_length
. - The tokenizer encoding argument
is_pretokenized
becomesis_split_into_words
.
Regarding the Trainer
class:
- The
Trainer
argumenttb_writer
is removed in favor of the callbackTensorBoardCallback(tb_writer=...)
. - The
Trainer
argumentprediction_loss_only
is removed in favor of the class argumentargs.prediction_loss_only
. - The
Trainer
attributedata_collator
should be a callable. - The
Trainer
method_log
is deprecated in favor oflog
. - The
Trainer
method_training_step
is deprecated in favor oftraining_step
. - The
Trainer
method_prediction_loop
is deprecated in favor ofprediction_loop
. - The
Trainer
methodis_local_master
is deprecated in favor ofis_local_process_zero
. - The
Trainer
methodis_world_master
is deprecated in favor ofis_world_process_zero
.
Regarding the TFTrainer
class:
- The
TFTrainer
argumentprediction_loss_only
is removed in favor of the class argumentargs.prediction_loss_only
. - The
Trainer
method_log
is deprecated in favor oflog
. - The
TFTrainer
method_prediction_loop
is deprecated in favor ofprediction_loop
. - The
TFTrainer
method_setup_wandb
is deprecated in favor ofsetup_wandb
. - The
TFTrainer
method_run_model
is deprecated in favor ofrun_model
.
Regarding the TrainerArgument
and TFTrainerArgument
classes:
- The
TrainerArgument
argumentevaluate_during_training
is deprecated in favor ofevaluation_strategy
. - The
TFTrainerArgument
argumentevaluate_during_training
is deprecated in favor ofevaluation_strategy
.
Regarding the Transfo-XL model:
- The Transfo-XL configuration attribute
tie_weight
becomestie_words_embeddings
. - The Transfo-XL modeling method
reset_length
becomesreset_memory_length
.
Regarding pipelines:
- The
FillMaskPipeline
argumenttopk
becomestop_k
.
Model Templates
Version 4.0.0 will be the first to include the experimental feature of model templates. These model templates aim to facilitate the addition of new models to the library by doing most of the work: generating the model/configuration/tokenization/test files that fit the API, with respect to the choice the user has made in terms of naming and functionality.
This release includes a model template for the encoder model (similar to the BERT architecture). Generating a model using the template will generate the files, put them at the appropriate location, reference them throughout the code-base, and generate a working test suite. The user should then only modify the files to their liking, rather than creating the model from scratch.
Feedback welcome, get started from the README here.
- Model templates encoder only #8509 (@LysandreJik)
New model additions
mT5 and T5 version 1.1 (@patrickvonplaten )
The T5v1.1 is an improved version of the original T5 model, see here: https://github.com/google-research/text-to-text-transfer-transformer/blob/master/released_checkpoints.md
The multilingual T5 model (mT5) was presented in https://arxiv.org/abs/2010.11934 and is based on the T5v1.1 architecture.
Multiple pre-trained checkpoints have been added to the library:
Relevant pull requests:
- T5 & mT5 #8552 (@patrickvonplaten)
- [MT5] More docs #8589 (@patrickvonplaten)
- Fix init for MT5 #8591 (@sgugger)
TF DPR
The DPR model has been added in TensorFlow to match its PyTorch counterpart by @ratthachat
- Add TFDPR #8203 (@ratthachat)
TF Longformer
Additional heads have been added to the TensorFlow Longformer implementation: SequenceClassification, MultipleChoice and TokenClassification
- Tf longformer for sequence classification #8231 (@elk-cloner)
Bug fixes and improvements
Transformers v4.0.0-rc-1: Fast tokenizers, model outputs, file reorganization
Transformers v4.0.0-rc-1: Fast tokenizers, model outputs, file reorganization
Breaking changes since v3.x
Version v4.0.0 introduces several breaking changes that were necessary.
1. AutoTokenizers and pipelines now use fast (rust) tokenizers by default.
The python and rust tokenizers have roughly the same API, but the rust tokenizers have a more complete feature set. The main breaking change is the handling of overflowing tokens between the python and rust tokenizers.
How to obtain the same behavior as v3.x in v4.x
- The pipelines now contain additional features out of the box. See the token-classification pipeline with the
grouped_entities
flag. - The auto-tokenizers now return rust tokenizers. In order to obtain the python tokenizers instead, the user may use the
use_fast
flag by setting it toFalse
:
In version v3.x
:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("xxx")
to obtain the same in version v4.x
:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("xxx", use_fast=False)
2. SentencePiece is removed from the required dependencies
The requirement on the SentencePiece dependency has been lifted from the setup.py
. This is done so that we may have a channel on anaconda cloud without relying on conda-forge
. This means that the tokenizers that depend on the SentencePiece library will not be available with a standard transformers
installation.
This includes the slow versions of:
XLNetTokenizer
AlbertTokenizer
CamembertTokenizer
MBartTokenizer
PegasusTokenizer
T5Tokenizer
ReformerTokenizer
XLMRobertaTokenizer
How to obtain the same behavior as v3.x in v4.x
In order to obtain the same behavior as version v3.x
, you should install sentencepiece
additionally:
In version v3.x
:
pip install transformers
to obtain the same in version v4.x
:
pip install transformers[sentencepiece]
or
pip install transformers sentencepiece
3. The architecture of the repo has been updated so that each model resides in its folder
The past and foreseeable addition of new models means that the number of files in the directory src/transformers
keeps growing and becomes harder to navigate and understand. We made the choice to put each model and the files accompanying it in their own sub-directories.
This is a breaking change as importing intermediary layers using a model's module directly needs to be done via a different path.
How to obtain the same behavior as v3.x in v4.x
In order to obtain the same behavior as version v3.x
, you should update the path used to access the layers.
In version v3.x
:
from transformers.modeling_bert import BertLayer
to obtain the same in version v4.x
:
from transformers.models.bert.modeling_bert import BertLayer
4. Switching the return_dict
argument to True
by default
The return_dict
argument enables the return of named-tuples-like python objects containing the model outputs, instead of the standard tuples. This object is self-documented as keys can be used to retrieve values, while also behaving as a tuple as users may retrieve objects by index or by slice.
This is a breaking change as the limitation of that tuple is that it cannot be unpacked: value0, value1 = outputs
will not work.
How to obtain the same behavior as v3.x in v4.x
In order to obtain the same behavior as version v3.x
, you should specify the return_dict
argument to False
, either in the model configuration or during the forward pass.
In version v3.x
:
outputs = model(**inputs)
to obtain the same in version v4.x
:
outputs = model(**inputs, return_dict=False)
5. Removed some deprecated attributes
Attributes that were deprecated have been removed if they had been deprecated for at least a month. The full list of deprecated attributes can be found in #8604.
Model Templates
Version 4.0.0 will be the first to include the experimental feature of model templates. These model templates aim to facilitate the addition of new models to the library by doing most of the work: generating the model/configuration/tokenization/test files that fit the API, with respect to the choice the user has made in terms of naming and functionality.
This release includes a model template for the encoder model (similar to the BERT architecture). Generating a model using the template will generate the files, put them at the appropriate location, reference them throughout the code-base, and generate a working test suite. The user should then only modify the files to their liking, rather than creating the model from scratch.
Feedback welcome, get started from the README here.
- Model templates encoder only #8509 (@LysandreJik)
New model additions
mT5 and T5 version 1.1 (@patrickvonplaten )
The T5v1.1 is an improved version of the original T5 model, see here: https://github.com/google-research/text-to-text-transfer-transformer/blob/master/released_checkpoints.md
The multilingual T5 model (mT5) was presented in https://arxiv.org/abs/2010.11934 and is based on the T5v1.1 architecture.
Multiple pre-trained checkpoints have been added to the library:
Relevant pull requests:
- T5 & mT5 #8552 (@patrickvonplaten)
- [MT5] More docs #8589 (@patrickvonplaten)
- Fix init for MT5 #8591 (@sgugger)
TF DPR
The DPR model has been added in TensorFlow to match its PyTorch counterpart by @ratthachat
- Add TFDPR #8203 (@ratthachat)
TF Longformer
Additional heads have been added to the TensorFlow Longformer implementation: SequenceClassification, MultipleChoice and TokenClassification
- Tf longformer for sequence classification #8231 (@elk-cloner)
Bug fixes and improvements
- [s2s/distill] hparams.tokenizer_name = hparams.teacher #8382 (@ShichaoSun)
- [examples] better PL version check #8429 (@stas00)
- Question template #8440 (@sgugger)
- [docs] improve bart/marian/mBART/pegasus docs #8421 (@sshleifer)
- Add auto next sentence prediction #8432 (@jplu)
- Windows dev section in the contributing file #8436 (@jplu)
- [testing utils] get_auto_remove_tmp_dir more intuitive behavior #8401 (@stas00)
- Add missing import #8444 (@jplu)
- [T5 Tokenizer] Fix t5 special tokens #8435 (@patrickvonplaten)
- using multi_gpu consistently #8446 (@stas00)
- Add missing tasks to
pipeline
docstring #8428 (@bryant1410) - [No merge] TF integration testing #7621 (@LysandreJik)
- [T5Tokenizer] fix t5 token type ids #8437 (@patrickvonplaten)
- Bug fix for apply_chunking_to_forward chunking dimension check #8391 (@pedrocolon93)
- Fix TF Longformer #8460 (@jplu)
- Add next sentence prediction loss computation #8462 (@jplu)
- Fix TF next sentence output #8466 (@jplu)
- Example NER script predicts on tokenized dataset #8468 (@sarnoult)
- Replaced unnecessary iadd operations on lists in tokenization_utils.py with proper list methods #8433 (@bombs-kim)
- Flax/Jax documentation #8331 (@mfuntowicz)
- [s2s] distill t5-large -> t5-small #8376 (@sbhaktha)
- Update deploy-docs dependencies on CI to enable Flax #8475 (@mfuntowicz)
- Fix on "examples/language-modeling" to support more datasets #8474 (@zeyuyun1)
- Fix doc bug #8500 (@mymusise)
- Model sharing doc #8498 (@sgugger)
- Fix SqueezeBERT for masked language model #8479 (@forresti)
- Fix logging in the examples #8458 (@jplu)
- Fix check scripts for Windows #8491 (@jplu)
- Add pretraining loss computation for TF Bert pretraining #8470 (@jplu)
- [T5] Bug correction & Refactor #8518 (@patrickvonplaten)
- Model sharing doc: more tweaks #8520 (@julien-c)
- [T5] Fix load weights function #8528 (@patrickvonplaten)
- Rework some TF tests #8492 (@jplu)
- [breaking|pipelines|tokenizers] Adding slow-fast tokenizers equivalence tests pipelines - Removing sentencepiece as a required dependency #8073 (@thomwolf)
- Adding the prepare_seq2seq_batch function to ProphetNet #8515 (@forest1988)
- Update version to v4.0.0-dev #8568 (@sgugger)
- TAPAS tokenizer & tokenizer tests #8482 (@LysandreJik)
- Switch
return_dict
toTrue
by default. #8530 (@sgugger) - Fix mixed precision issue for GPT2 #8572 (@jplu)
- Reorganize repo #8580 (@sgugger)
- Tokenizers: ability to load from model subfolder #8586 (@julien-c)
- Fix model templates #8595 (@sgugger)
- [examples tests] tests that are fine on multi-gpu #8582 (@stas00)
- Fix check repo utils #8600 (@sgugger)
- Tokenizers should be framework agnostic #8599 (@LysandreJik)
- Remove deprecated #8604 (@sgugger)
- Fixed link to the wrong paper. #8607 (@cronoik)
- Reset loss to zero on logging in Trainer to avoid bfloat16 issues #8561 (@bminixhofer)
- Fix DataCollatorForLanguageModeling #8621 (@sgugger)
- [s2s] multigpu skip #8613 (@stas00)
- [s2s] fix finetune.py to adjust for #8530 changes #8612 (@stas00)
- tf_bart typo - self.self.activation_dropout #8611 (@ratthachat)
- New TF loading weights #8490 (@jplu)
- Adding PrefixConstrainedLogitsProcessor #8529 (@nicola-decao)
- [Tokenizer Doc] Improve tokenizer summary #8622 (@patrickvonplaten)
- Fixes the training resuming with gradient accumulation #8624 (@sgugger)
- Fix training from scratch in new scripts #8623 (@sgugger)
- [s2s] distillation apex breaks return_dict obj #8631 (@stas00)
- Updated the Extractive Question Answering code snippe...
v3.5.1
v3.5.0: Model versioning, TensorFlow encoder-decoder models, new scripts, refactor of the `generate` method
Model versioning, TensorFlow encoder-decoder models, new scripts, refactor of the generate
method
Model versioning
We host more and more of the community's models which is awesome ❤️. To scale this sharing, we needed to change the infra to both support more models, and unlock new powerful features.
To that effect, we have rebuilt the storage backend that we use for models (currently S3), to our own git repos (using S3 as a git-lfs endpoint for large files), with one model = one repo.
The benefits of this switch are:
- built-in versioning (I mean… it’s git. It’s pretty much what you use for versioning. Versioning in S3 has a ton a limitations)
- access control (will unlock private models, private datasets, etc)
- scalability (our usage of S3 to maintain lists of models was starting to bottleneck)
Let's dive in to the actual changes:
I. On the website
You'll now see a "Browse files and versions" tab or button on each model page. (design is not final, we'll make it more prominent/streamlined in the near future)
This is what this page looks like:
The UX should look familiar and self-explanatory, but we'll add more ML-specific features in the future.
You can:
- see commit histories and diffs of changes made to any text file, like config.json:
- changes made by the HuggingFace team will be way clearer – we can perform updates to the models to ensure they work well with the library(ies) (you'll be able to opt out from those changes)
- Large binary files are stored using https://git-lfs.github.com/ which is pretty standard now, and interoperable out of the box with git
- Ability to update your text files, like your README.md model card, directly on the website!
- with instant preview 🔥
II. In the transformers library
The PR to enable this new storage mode in the transformers
library is available here: #8324
This PR has two parts:
1. changes to the file downloading code used in from_pretrained()
methods to use the new file URLs.
Large files are stored in an S3 bucket and served by Cloudfront so downloads should be as fast as they are right now.
In addition, you now have a way to pin a specific version of a model, to a commit hash, tag or branch.
For instance:
tokenizer = AutoTokenizer.from_pretrained(
"julien-c/EsperBERTo-small",
revision="v2.0.1" # tag name, or branch name, or commit hash
)
Finally, the networking code is more robust and doesn't gobble up errors anymore, so in case you have trouble downloading a specific file you'll know exactly why.
2. changes to the model upload CLI to create a model repo then be able to git clone and git push to it.
We are intentionally not wrapping git
too much because we expect most model authors to be familiar with git (and possibly git-lfs), let us know if not the case.
To create a repo:
transformers-cli repo create your-model-name
Then you'll get a repo url that you'll be able to clone:
git clone https://huggingface.co/username/your-model-name
# Then commit as usual
cd your-model-name
echo "hello" >> README.md
git add . && git commit -m "Update from $USER"
A nice side effect of the new system on the upload side is that file uploading should be more robust for very large files (hello T5!) as git-lfs handles the networking code.
By the way, again, every model is its own repo. So you can git clone any public model if you'd like:
git clone https://huggingface.co/gpt2
But you won't be able to push unless it's one of your models (or one of your orgs').
III. Backward compatibility
- Backward compatibility on model downloads is expected, because even though the new models will be stored in huggingface.co-hosted git repos, we will backport all file changes to S3 automatically.
⚠️ Model uploads using the current system won't work anymore: you'll need to upgrade your transformers installation to the next release,v3.5.0
, or to build frommaster
.
Alternatively, in the next week or so we'll add the ability to create a repo from the website directly so you'll be able to push even without the transformers library.
TFMarian, TFMbart, TFPegasus, TFBlenderbot
- Add tensorflow 2.0 functionality for SOTA seq2seq transformers #7987 (@sshleifer)
New and updated scripts
We'working on giving examples on how to leverage the 🤗 Datasets library and the Trainer API. Those scripts are meant as examples easy to customize, with lots of comments explaining the various steps. The following tasks are now covered:
- Text classification : New run glue script #7917 (@sgugger)
- Causal Language Modeling: New run_clm script #8105 (@sgugger)
- Masked Language Modeling: Add line by line option to mlm/plm scripts #8240 (@sgugger)
- Token classification: Add new token classification example #8340 (@sgugger)
Seq2Seq Trainer
A child of Trainer
specialized for training seq2seq models, from @patil-suraj, @stas00 and @sshleifer. Accessible through examples/seq2seq/finetune_trainer.py
. API is similar to examples/seq2seq/finetune.py
, but API support is better. Example scripts are in examples/seq2seq/builtin_trainer
.
- [seq2seq testing] multigpu test run via subprocess #7281 (@stas00)
- [s2s trainer] tests to use distributed on multi-gpu machine #7965 (@stas00)
- [Seq2Seq] Allow EncoderDecoderModels to be trained with Seq2Seq #7809 (@patrickvonplaten)
- [Seq2Seq Trainer] Make sure padding is implemented for models without pad_token #8043 (@patrickvonplaten)
- [Seq2SeqTrainer] Move import to init to make file self-contained #8194 (@patrickvonplaten)
- [s2s test] cleanup #8131 (@stas00)
- [Seq2Seq] Correct import in Seq2Seq Trainer #8254 (@patrickvonplaten)
- [Seq2Seq] Make Seq2SeqArguments an independent file #8267 (@patrickvonplaten)
- [Seq2SeqDataCollator] dont pass add_ prefix_space=False to all tokenizers #8329 (@sshleifer)
Seq2Seq Testing and Documentation Improvements
- [s2s] create doc for pegasus/fsmt replication #7934 (@stas00)
- [s2s] test_distributed_eval #8315 (@stas00)
- [s2s] test_bash_script.py - actually learn something #8318 (@stas00)
- [s2s examples test] fix data path #8398 (@stas00)
- [s2s test_finetune_trainer] failing multigpu test #8400 (@stas00)
- [s2s/distill] remove run_distiller.sh, fix xsum script #8412 (@sshleifer)
Docs for DistillBART Paper Replication
Re-run experiments from the paper here
- [s2s] distillBART docs for paper replication #8150 (@sshleifer)
Refactoring the generate()
function
The generate()
method now has a new design so that the user can directly call upon the methods
sample()
, greedy_search()
, beam_search()
and beam_sample()
. The code was made more readable, and beam search was sped-up by ca. 5-10%.
Refactoring the generate() function #6949 (@patrickvonplaten)
Notebooks
- added qg evaluation notebook #7958 (@zolekode)
- adding beginner-friendly notebook on text classification with DistilBERT/TF #7964 (@peterbayerle)
- [Notebooks] Add new encoder-decoder notebooks #8246 (@patrickvonplaten)
General improvements and bugfixes
- Respect the 119 line chars #7928 (@LysandreJik)
- PPL guide code snippet minor fix #7938 (@joeddav)
- [ProphetNet] Add Question Generation Model + Test #7942 (@patrickvonplaten)
- [multiple models] skip saving/loading deterministic state_dict keys #7878 (@stas00)
- Add missing comma #7870 (@mrm8488)
- TensorBoard/Wandb/optuna/raytune integration improvements. #7935 (@madlag)
- [ProphetNet] Correct Doc string example #7944 (@patrickvonplaten)
- [GPT2 batch generation] Make test clearer.
do_sample=True
is not deterministic. #7947 (@patrickvonplaten) - fix 'encode_plus' docstring for 'special_tokens_mask' (0s and 1s were reversed) #7949 (@epwalsh)
- Herbert tokenizer auto load #7968 (@rmroczkowski)
- [testing] slow tests should be marked as slow #7895 (@stas00)
- support relative path for best_model_checkpoint #7973 (@HaebinShin)
- Disable inference API for t5-11b #7978 (@julien-c)
- [fsmt test] basic config test with online model + super tiny model #7860 (@stas00)
- Add whole word mask support for lm fine-tune #7925 (@wlhgtc)
- [PretrainedConfig] Fix save pretrained config for edge case #7943 (@patrickvonplaten)
- GPT2 - Remove else branch adding 0 to the hidden state if token_type_embeds is None. #7977 (@mfuntowicz)
- Fixing the "translation", "translation_XX_to_YY" pipelines. #7975 (@Narsil)
- FillMaskPipeline: support passing top_k on call #7971 (@julien-c)
- Only log total_flos at the end of training #7981 (@sgugger)
- add zero shot pipeline tags & examples #7983 (@joeddav)
- Reload checkpoint #7984 (@sgugger)
- [gh ci] less output ( --durations=50) #7989 (@sshleifer)
- Move NoLayerEmbedTokens #7945 (@sshleifer)
- update zero shot default widget example #7992 (@joeddav)
- [RAG] Handle the case when title is None while loading own datasets #7941 (@lalitpagaria)
- [tests|tokenizers] Refactoring pipelines test backbone - Small tokenizers improvements - General tests speedups #7970 (@thomwolf)
- [Reformer] remove reformer pad_token_id #7991 (@patrickvonplaten)
- Fix BatchEncoding.word_to_tokens for removed tokens #7939 (@n1t0)
- Handling longformer model_type #7990 (@ethanjperez)
- [doc prepare_seq2seq_batch] fix docs #8013 (@patil-suraj)
- [tokenizers] Fixing #8001 - Adding tests on tokenizers serialization #8006 (@thomwolf)
- Add mixed...
ProphetNet, Blenderbot, SqueezeBERT, DeBERTa
ProphetNet, Blenderbot, SqueezeBERT, DeBERTa
ProphetNET
Two new models are released as part of the ProphetNet implementation: ProphetNet
and XLM-ProphetNet
.
ProphetNet is an encoder-decoder model and can predict n-future tokens for “ngram” language modeling instead of just the next token.
XLM-ProphetNet is an encoder-decoder model with an identical architecture to ProhpetNet, but the model was trained on the multi-lingual “wiki100” Wikipedia dump.
The ProphetNet model was proposed in ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training, by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang, Ming Zhou on 13 Jan, 2020.
It was added to the library in PyTorch with the following checkpoints:
microsoft/xprophetnet-large-wiki100-cased-xglue-ntg
microsoft/prophetnet-large-uncased
microsoft/prophetnet-large-uncased-cnndm
microsoft/xprophetnet-large-wiki100-cased
microsoft/xprophetnet-large-wiki100-cased-xglue-qg
Contributions:
- ProphetNet #7157 (@qiweizhen, @patrickvonplaten)
BlenderBot
Blenderbot is an encoder-decoder model for open-domain chat. It uses a standard seq2seq model transformer-based architecture.
The Blender chatbot model was proposed in Recipes for building an open-domain chatbot Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston on 30 Apr 2020.
It was added to the library in PyTorch with the following checkpoints:
facebook/blenderbot-90M
facebook/blenderbot-3B
Contributions:
- Blenderbot #7418 (@sshleifer)
SqueezeBERT
The SqueezeBERT model was proposed in SqueezeBERT: What can computer vision teach NLP about efficient neural networks? by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, Kurt W. Keutzer. It’s a bidirectional transformer similar to the BERT model. The key difference between the BERT architecture and the SqueezeBERT architecture is that SqueezeBERT uses grouped convolutions instead of fully-connected layers for the Q, K, V and FFN layers.
It was added to the library in PyTorch with the following checkpoints:
squeezebert/squeezebert-mnli
squeezebert/squeezebert-uncased
squeezebert/squeezebert-mnli-headless
Contributions:
- SqueezeBERT architecture #7083 (@forresti)
- Fix squeezebert docs #7587 (@LysandreJik)
DeBERTa
The DeBERTa model was proposed in DeBERTa: Decoding-enhanced BERT with Disentangled Attention by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen It is based on Google’s BERT model released in 2018 and Facebook’s RoBERTa model released in 2019.
It was added to the library in PyTorch with the following checkpoints:
microsoft/deberta-base
microsoft/deberta-large
Contributions:
- Add DeBERTa model #5929 (@BigBird01)
- Fix DeBERTa integration tests #7729 (@LysandreJik)
Both SentencePiece and Tokenizers are now optional libraries
Support for SentencePiece is now part of the tokenizers
library! Thanks to this we now have near-full support of fast tokenizers in the library.
With this new feature, we slightly change the paradigm regarding installation:
-
SentencePiece is now an optional dependency, paving the way to a fully-featured conda install in the near future
-
Tokenizers is now also an optional dependency, making it possible to install and use the library even when rust cannot be compiled on the machine.
-
[Dependencies|tokenizers] Make both SentencePiece and Tokenizers optional dependencies #7659 (@thomwolf)
The main __init__
has been improved to always import the same functions and classes. If someone then tries to use a class that requires an optional dependency, an ImportError
will be raised at init (with instructions on how to install the missing dependency) #7537 (@sgugger)
Improvements made to the Trainer
The Trainer
API has been improved to work with models requiring several labels or returning several outputs, and to have clearer progress tracking. A new TrainerCallback
class has been added to allow the user to easily customize the default training loop.
- Remove config assumption in Trainer #7464 (@sgugger)
- Clean the Trainer state #7490 (@sgugger)
- Small QOL improvements to TrainingArguments #7475 (@sgugger)
- Allow nested tensors in predicted logits #7542 (@sgugger)
- Trainer callbacks #7596 (@sgugger)
- Add specific notebook ProgressCalback #7793 (@sgugger)
- Small fixes to NotebookProgressCallback #7813 (@sgugger)
- Add predict step accumulation #7767 (@sgugger)
- Don't use
store_xxx
on optional bools #7786 (@sgugger)
Seq2Seq Trainer
A child of Trainer
specialized for training seq2seq models, from @patil-suraj and @sshleifer. Accessible through examples/seq2seq/finetune_trainer.py
.
- example scripts at
examples/seq2seq/builtin_trainer/
- same functionality as
examples/seq2seq/finetune.py
, but better TPU support. - [examples/s2s] clean up finetune_trainer #7509 (@patil-suraj)
- [s2s] trainer scripts: Remove --run_name, thanks sylvain! #7521 (@sshleifer)
- [s2s] Adafactor support for builtin trainer #7522 (@sshleifer)
- [s2s] add config params like Dropout in Seq2SeqTrainingArguments #7532 (@patil-suraj)
- Distributed Trainer: 2 little fixes #7461 (@sshleifer)
- [s2sTrainer] test + code cleanup #7467 (@sshleifer)
- Seq2SeqDataset: avoid passing src_lang everywhere #7470 (@amanpreet692)
- [s2strainer] fix eval dataset loading #7477 (@patil-suraj)
- [pseudolabels] cleanup markdown table #7653 (@sshleifer)
Distributed Generation
- You can run
model.generate
in pytorch on a large dataset and split the work across multiple GPUs, usingexamples/seq2seq/run_distributed_eval.py
- [s2s] release pseudolabel links and instructions #7639 (@sshleifer)
- [s2s] Fix t5 warning for distributed eval #7487 (@sshleifer)
- [s2s] fix kwargs style #7488 (@sshleifer)
- [s2s] fix lockfile and peg distillation constants #7545 (@sshleifer)
- [s2s] fix nltk pytest race condition with FileLock #7515 (@sshleifer)
Notebooks
- Train T5 in Tensoflow 2 Community Notebook #7428 (@HarrisDePerceptron)
General improvements and bugfixes
- remove codecov PR comments #7400 (@sshleifer)
- Get a better error when check_copies fails #7457 (@sgugger)
- Multi-GPU Testing setup #7453 (@LysandreJik)
- Fix LXMERT with DataParallel #7471 (@LysandreJik)
- Number of GPUs for multi-gpu #7472 (@LysandreJik)
- Make transformers install check positive #7473 (@FremyCompany)
- Alphabetize model lists #7478 (@sgugger)
- Bump isort version. #7484 (@sgugger)
- Add forgotten return_dict argument in the docs #7483 (@sgugger)
- Enable pegasus fp16 by clamping large activations #7243 (@sshleifer)
- Update LayoutLM doc #7388 (@Al31415)
- Report Tune metrics in final evaluation #7507 (@krfricke)
- Fix Ray Tune progress_reporter kwarg #7508 (@krfricke)
- [Seq2Seq] Fix a couple of bugs and clean examples #7474 (@patrickvonplaten)
- [Attention Mask] Fix data type #7513 (@patrickvonplaten)
- Fix seq2seq example test #7518 (@sgugger)
- Remove labels from the RagModel example #7560 (@sgugger)
- added script for fine-tuning roberta for sentiment analysis task #7505 (@DhavalTaunk08)
- LayoutLM: add exception handling for bbox values #7452 (@Al31415)
- Cleanup documentation for BART, Marian, MBART and Pegasus #7523 (@sgugger)
- Add Electra unexpected keys #7569 (@LysandreJik)
- Fix tokenization in SQuAD for RoBERTa, Longformer, BART #7387 (@tholor)
- docs(pretrained_models): fix num parameters #7575 (@amineabdaoui)
- Update Code example according to deprecation of AutoModeWithLMHead #7555 (@jshamg)
- Allow soft dependencies in the namespace with ImportErrors at use #7537 (@sgugger)
- Fix post_init of some TrainingArguments #7525 (@sgugger)
- Check and update model list in index.rst automatically #7527 (@sgugger)
- Expand test to locate flakiness #7580 (@sgugger)
- Custom TF weights loading #7422 (@jplu)
- Documentation fixes #7585 (@sgugger)
- Documentation framework toggle should stick #7586 (@LysandreJik)
- Support T5 Distillation w/hidden state supervision #7599 (@sshleifer)
- [makefile] check only .py files #7588 (@stas00)
- [TF generation] Fix typo #7582 (@SidJain1412)
- change return dicitonary for DataCollatorForNextSentencePrediction from masked_lm_labels to labels #7595 (@gmihaila)
- Docker GPU Images: Add NVIDIA/apex to the cuda images with pytorch #7598 (@AdrienDS)
- typo fix #7611 (@agemagician)
- [bart] fix config.classif_dropout #7593 (@sshleifer)
- [s2s] save first batch to json for debugging purposes #6810 (@sshleifer)
- Add GPT2ForSequenceClassification based on DialogRPT #7501 (@LysandreJik)
- Fix wrong reference name/filename in docstring of
SquadProcessor
#7616 (@phiyodr) - Fix tokenizer UnboundLocalError when padding is set to PaddingStrategy.MAX_LENGTH #7610 (@GabrielePicco)
- Add GPT2 to sequence classification auto model #7630 (@LysandreJik)
- Replaced torch.load for loading the pretrained vocab of TransformerXL tokenizer to pickle.load #6935 (@w4nderlust)
- Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer #7141 (@thomwolf)
- Green tests: update torch-hub test dependencies (add protobuf and pin tokenizer 0.9.0-RC2) #7658 (@thomwolf)
- Fix RobertaForCausalLM docs #7642 (@LysandreJik)
- [s2s] configure lr_scheduler from command line #7641 (@patil-suraj)
- [pseudo] Switch URLS to CDN #7661 (@sshleifer)
- [s2s] Switch README urls to cdn #7670 (@sshleifer)
- fix nn.DataParallel compatibility with PyTorch 1.5 #7671 (@guhur)
- Update XLM-RoBERTa pretrained model details #7669 (@noahtren)
- Fix dataset cardinality #7678 (@jplu)
- [pegasus] Faster ...
v3.3.1
RAG
RAG
RAG Model
The RAG model is a retrieval-augmented generation model that can be leveraged for question-answering tasks using RagTokenForGeneration
or RagSequenceForGeneration
as proposed in Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks by Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela.
It was added to the library in PyTorch with the following checkpoints:
facebook/rag-token-nq
facebook/rag-sequence-nq
facebook/rag-token-base
facebook/rag-sequence-base
Contributions:
- RAG #6813 (@ola13)
- [RAG] Add
attention_mask
to RAG generate #7373 (@patrickvonplaten) - [RAG] Add missing doc and attention_mask to rag #7382 (@patrickvonplaten)
- [Rag] Fix wrong usage of
num_beams
andbos_token_id
in Rag Sequence generation #7386 (@patrickvonplaten) - [RAG] Fix retrieval offset in RAG's HfIndex and better integration tests #7372 (@lhoestq)
- [RAG] Remove dependency on
examples/seq2seq
from rag #7395 (@ola13) - [Rag] fix rag retriever save_pretrained method #7399 (@patrickvonplaten)
- [RAG] Clean Rag readme in examples #7413 (@ola13)
- [RAG] Model cards - clean cards #7420 (@patrickvonplaten)
- Document RAG again #7377 (@sgugger)
Bug fixes and improvements
- Mark big downloads slow #7325 (@sgugger)
- [Bug Fix] The actual batch_size is inconsistent with the settings. #7235 (@HuangLianzhe)
- Fixed results of SQuAD-FR evaluation #7313 (@psorianom)
- [s2s] add supported architecures to MD #7252 (@sshleifer)
- Add num workers cli arg #7322 (@chadykamar)
- [s2s] add src_lang kwarg for distributed eval #7300 (@sshleifer)
- [s2s] only save metrics.json from rank zero #7331 (@sshleifer)
- [code quality] fix confused flake8 #7309 (@stas00)
- [testing] skip decorators: docs, tests, bugs #7334 (@stas00)
- Fixed evaluation_strategy on epoch end bug #7340 (@WissamAntoun)
- Models doc #7345 (@sgugger)
- Ensure that integrations are imported before transformers or ml libs #7330 (@dsblank)
- [Benchmarks] Change all args to from
no_...
to their positive form #7075 (@fmcurti) - Remove reference to args in XLA check #7344 (@ZeroCool2u)
- wip: Code to add lang tags to marian model cards #6586 (@sshleifer)
- Expand a bit the documentation doc #7350 (@sgugger)
- Check decorator order #7326 (@sgugger)
- Update modeling_tf_longformer.py #7359 (@Line290)
- Updata tokenization_auto.py #6870 (@hjptriplebee)
- Update the TF models to remove their interdependencies #7238 (@jplu)
- Make PyTorch model files independent from each other #7352 (@sgugger)
- Clean RAG docs and template docs #7348 (@sgugger)
- Fixing case in which
Trainer
hung while saving model in distributed training #7365 (@TevenLeScao) - Formatter #7368 (@LysandreJik)
- [seq2seq] make it easier to run the scripts #7274 (@stas00)
- Remove mentions of RAG from the docs #7376 (@sgugger)
- [fsmt] build/test scripts #7257 (@stas00)
- [s2s] distributed eval allows num_return_sequences > 1 #7254 (@sshleifer)
- Seq2SeqTrainer #6769 (@patil-suraj)
- modeling_bart: 3 small cleanups that dont change outputs #7381 (@sshleifer)
- Check config type using
type
instead ofisinstance
#7363 (@LysandreJik) - [s2s, examples] minor doc changes #7385 (@patil-suraj)
- Remove unhelpful bart warning #7391 (@sshleifer)
- [code quality] new make target that combines style and quality targets #7310 (@stas00)
- Speedup check_copies script #7394 (@sgugger)
- Fix BartModel output documentation #7390 (@sgugger)
- Fix FP16 and attention masks in FunnelTransformer #7374 (@sgugger)
- [Longformer, Bert, Roberta, ...] Fix multi gpu training #7272 (@patrickvonplaten)
- [s2s] add create student script #7290 (@patil-suraj)
- [s2s] rougeLSum expects \n between sentences #7410 (@sshleifer)
- [T5] allow config.decoder_layers to control decoer size #7409 (@sshleifer)
- Flos fix #7384 (@marrrcin)
- Catch PyTorch warning when saving/loading scheduler #7401 (@sgugger)
- Pull request template #7392 (@LysandreJik)
- Reorganize documentation navbar #7423 (@sgugger)
Bert Seq2Seq models, FSMT, LayoutLM, Funnel Transformer, LXMERT
Bert Seq2Seq models, FSMT, Funnel Transformer, LXMERT
BERT Seq2seq models
The BertGeneration model is a BERT model that can be leveraged for sequence-to-sequence tasks using EncoderDecoderModel as proposed in Leveraging Pre-trained Checkpoints for Sequence Generation Tasks by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
It was added to the library in PyTorch with the following checkpoints:
google/roberta2roberta_L-24_bbc
google/roberta2roberta_L-24_gigaword
google/roberta2roberta_L-24_cnn_daily_mail
google/roberta2roberta_L-24_discofuse
google/roberta2roberta_L-24_wikisplit
google/bert2bert_L-24_wmt_de_en
google/bert2bert_L-24_wmt_en_de
Contributions:
- Add "Leveraging Pretrained Checkpoints for Generation" Seq2Seq models. #6594 (@patrickvonplaten)
FSMT (FairSeq MachineTranslation)
FSMT (FairSeq MachineTranslation) models were introduced in Facebook FAIR’s WMT19 News Translation Task Submission by Nathan Ng, Kyra Yee, Alexei Baevski, Myle Ott, Michael Auli, Sergey Edunov.
It was added to the library in PyTorch, with the following checkpoints:
facebook/wmt19-en-ru
facebook/wmt19-en-de
facebook/wmt19-ru-en
facebook/wmt19-de-en
Contributions:
- [ported model] FSMT (FairSeq MachineTranslation) #6940 (@stas00)
- build/eval/gen-card scripts for fsmt #7155 (@stas00)
- skip failing FSMT CUDA tests until investigated #7220 (@stas00)
- [fsmt] rewrite SinusoidalPositionalEmbedding + USE_CUDA test fixes + new TranslationPipeline test #7224 (@stas00)
- [s2s] adjust finetune + test to work with fsmt #7263 (@stas00)
- [fsmt] SinusoidalPositionalEmbedding no need to pass device #7292 (@stas00)
- Adds FSMT to LM head AutoModel #7312 (@LysandreJik)
LayoutLM
The LayoutLM model was proposed in LayoutLM: Pre-training of Text and Layout for Document Image Understandin by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, and Ming Zhou. It’s a simple but effective pre-training method of text and layout for document image understanding and information extraction tasks, such as form understanding and receipt understanding.
It was added to the library in PyTorch with the following checkpoints:
layoutlm-base-uncased
layoutlm-large-uncased
Contributions:
- Add LayoutLM Model #7064 (@liminghao1630)
- Fixes for LayoutLM #7318 (@sgugger)
Funnel Transformer
The Funnel Transformer model was proposed in the paper Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing. It is a bidirectional transformer model, like BERT, but with a pooling operation after each block of layers, a bit like in traditional convolutional neural networks (CNN) in computer vision.
It was added to the library in both PyTorch and TensorFlow, with the following checkpoints:
funnel-transformer/small
funnel-transformer/small-base
funnel-transformer/medium
funnel-transformer/medium-base
funnel-transformer/intermediate
funnel-transformer/intermediate-base
funnel-transformer/large
funnel-transformer/large-base
funnel-transformer/xlarge
funnel-transformer/xlarge-base
Contributions:
LXMERT
The LXMERT model was proposed in LXMERT: Learning Cross-Modality Encoder Representations from Transformers by Hao Tan & Mohit Bansal. It is a series of bidirectional transformer encoders (one for the vision modality, one for the language modality, and then one to fuse both modalities) pre-trained using a combination of masked language modeling, visual-language text alignment, ROI-feature regression, masked visual-attribute modeling, masked visual-object modeling, and visual-question answering objectives. The pretraining consists of multiple multi-modal datasets: MSCOCO, Visual-Genome + Visual-Genome Question Answering, VQA 2.0, and GQA.
It was added to the library in TensorFlow with the following checkpoints:
unc-nlp/lxmert-base-uncased
unc-nlp/lxmert-vqa-uncased
unc-nlp/lxmert-gqa-uncased
Contributions
- Adding the LXMERT pretraining model (MultiModal languageXvision) to HuggingFace's suite of models #5793 (@eltoto1219)
- [LXMERT] Fix tests on gpu #6946 (@patrickvonplaten)
New pipelines
The following pipeline was added to the library:
- [pipelines] Text2TextGenerationPipeline #6744 (@patil-suraj)
Notebooks
The following community notebooks were contributed to the library:
- Demoing LXMERT with raw images by incorporating the FRCNN model for roi-pooled extraction and bounding-box predction on the GQA answer set. #6986 (@eltoto1219)
- [Community notebooks] Add notebook on fine-tuning GPT-2 Model with Trainer Class #7005 (@philschmid)
- Add "Fine-tune ALBERT for sentence-pair classification" notebook to the community notebooks #7255 (@NadirEM)
- added multilabel text classification notebook using distilbert to community notebooks #7201 (@DhavalTaunk08)
Encoder-decoder architectures
An additional encoder-decoder architecture was added:
- [EncoderDecoder] Add xlm-roberta to encoder decoder #6878 (@patrickvonplaten)
Bug fixes and improvements
- TF Flaubert w/ pre-norm #6841 (@LysandreJik)
- Dataset and DataCollator for BERT Next Sentence Prediction (NSP) task #6644 (@HuangLianzhe)
- Fix in Adafactor docstrings #6845 (@sgugger)
- Fix resuming training for Windows #6847 (@sgugger)
- Only access loss tensor every logging_steps #6802 (@jysohn23)
- Marian distill scripts + integration test #6799 (@sshleifer)
- Add checkpointing to Ray Tune HPO #6747 (@krfricke)
- Split hp search methods #6857 (@sgugger)
- Update ONNX notebook to include section on quantization. #6831 (@mfuntowicz)
- Fix marian slow test #6854 (@sshleifer)
- [s2s] command line args for faster val steps #6833 (@sshleifer)
- Bart can make decoder_input_ids from labels #6758 (@sshleifer)
- add a final report to all pytest jobs #6861 (@stas00)
- Logging doc #6852 (@sgugger)
- Restore PaddingStrategy.MAX_LENGTH on QAPipeline while no v2. #6875 (@mfuntowicz)
- [Generate] Facilitate PyTorch generate using
ModelOutputs
#6735 (@patrickvonplaten) - Add cache_dir to save features TextDataset #6879 (@jysohn23)
- [Docs, Examples] Fix QA example for PT #6890 (@patrickvonplaten)
- Update modeling_bert.py #6897 (@parthe)
- [Electra] fix warning for position ids #6884 (@patrickvonplaten)
- minor docs grammar fixes #6889 (@harrywang)
- Fix error class instantiation #6634 (@tamuhey)
- Output attention takes an s #6903 (@sgugger)
- [testing] fix ambiguous test #6898 (@stas00)
- test_tf_common: remove un_used mixin class parameters #6866 (@PuneethaPai)
- Template updates #6914 (@sgugger)
- Changed link to the correct paper in the second paragraph #6905 (@sengl)
- tweak tar command in readme #6919 (@brettkoonce)
- [s2s]: script to convert pl checkpoints to hf checkpoints #6911 (@sshleifer)
- [s2s] allow task_specific_params=summarization_xsum #6923 (@sshleifer)
- move wandb/comet logger init to train() to allow parallel logging #6850 (@krfricke)
- [s2s] use --eval_beams command line arg #6926 (@sshleifer)
- [s2s] support early stopping based on loss, rather than rouge #6927 (@sshleifer)
- Fix mixed precision issue in TF DistilBert #6915 (@chiapas)
- [docstring] misc arg doc corrections #6932 (@stas00)
- [s2s] distill: --normalize_hidden --supervise_forward #6834 (@sshleifer)
- [s2s] run_eval.py parses generate_kwargs #6948 (@sshleifer)
- [doc] remove the implied defaults to :obj:
None
, s/True/ :obj:`True/, etc. #6956 (@stas00) - [s2s] warn if --fp16 for torch 1.6 #6977 (@sshleifer)
- feat: allow prefix for any generative model #5885 (@borisdayma)
- Trainer with grad accum #6930 (@sgugger)
- Cannot index
None
#6984 (@LysandreJik) - [docstring] missing arg #6933 (@stas00)
- [testing] add dependency: parametrize #6958 (@stas00)
- Fixed the default number of attention heads in Reformer Configuration #6973 (@tznurmin)
- [gen utils] missing else case #6980 (@stas00)
- match CI's version of flake8 #6941 (@stas00)
- Conversion scripts shouldn't have relative imports #6991 (@LysandreJik)
- Add missing arguments for BertWordPieceTokenizer #5810 (@monologg)
- fixed trainer tr_loss memory leak #6999 (@StuartMesham)
- Floating-point operations logging in trainer #6768 (@TevenLeScao)
- Fixing FLOPS merge by checking if torch is available #7013 (@LysandreJik)
- [Longformer] Fix longformer documentation #7016 (@patrickvonplaten)
- pegasus.rst: fix expected output #7017 (@sshleifer)
- adding TRANSFORMERS_VERBOSITY env var #6961 (@stas00)
- [generation] consistently add eos tokens #6982 (@stas00)
- [from_pretrained] Allow tokenizer_type ≠ model_type #6995 (@julien-c)
- replace torch.triu with onnx compatible code #6929 (@HenryDashwood)
- Batch encore plus and overflowing tokens fails when non existing overflowing tokens for a sequence #6677 (@LysandreJik)
- add -y to bypass prompt for transformers-cli upload #7035 (@stas00)
- Fix confusing warnings during TF2 import from PyTorch #6623 (@jcrocholl)
- Albert pretrain datasets/ datacollator #6168 (@yl-to)
- Fix template #7040 (@LysandreJik)
- Small fixes in tf template #7044 (@sgugger)
- Add "Leveraging Pretrained Checkpoints for Generation" Seq2Seq models. #6594 (@patrickvonplaten)
- fix to ensure that returned tensors after the tokenization is Long #7039 (@GeetDsa)
- [BertGeneration] Correct Doc Title #7048 (@patrickvonplaten)
- [BertGeneration, Docs] Fix another old name in docs #7050 (@patrickvonplaten)
- [xlm tok] config dict: fix str into int to match definition #7034 (@stas00)
- [s2s] --eval_max_generate_length #7018 (@sshleifer)
- Fix CI w...