Releases · huggingface/pytorch-image-models

29 Aug 19:06

rwightman

v0.9.6

f544d49

Release v0.9.6

Aug 28, 2023

Add dynamic img size support to models in vision_transformer.py, vision_transformer_hybrid.py, deit.py, and eva.py w/o breaking backward compat.
- Add dynamic_img_size=True to args at model creation time to allow changing the grid size (interpolate abs and/or ROPE pos embed each forward pass).
- Add dynamic_img_pad=True to allow image sizes that aren't divisible by patch size (pad bottom right to patch size each forward pass).
- Enabling either dynamic mode will break FX tracing unless PatchEmbed module added as leaf.
- Existing method of resizing position embedding by passing different img_size (interpolate pretrained embed weights once) on creation still works.
- Existing method of changing patch_size (resize pretrained patch_embed weights once) on creation still works.
- Example validation cmd python validate.py /imagenet --model vit_base_patch16_224 --amp --amp-dtype bfloat16 --img-size 255 --crop-pct 1.0 --model-kwargs dynamic_img_size=True dyamic_img_pad=True

Aug 25, 2023

Many new models since last release
- FastViT - https://arxiv.org/abs/2303.14189
- MobileOne - https://arxiv.org/abs/2206.04040
- InceptionNeXt - https://arxiv.org/abs/2303.16900
- RepGhostNet - https://arxiv.org/abs/2211.06088 (thanks https://github.com/ChengpengChen)
- GhostNetV2 - https://arxiv.org/abs/2211.12905 (thanks https://github.com/yehuitang)
- EfficientViT (MSRA) - https://arxiv.org/abs/2305.07027 (thanks https://github.com/seefun)
- EfficientViT (MIT) - https://arxiv.org/abs/2205.14756 (thanks https://github.com/seefun)
Add --reparam arg to benchmark.py, onnx_export.py, and validate.py to trigger layer reparameterization / fusion for models with any one of reparameterize(), switch_to_deploy() or fuse()
- Including FastViT, MobileOne, RepGhostNet, EfficientViT (MSRA), RepViT, RepVGG, and LeViT
Preparing 0.9.6 'back to school' release

Aug 11, 2023

Swin, MaxViT, CoAtNet, and BEiT models support resizing of image/window size on creation with adaptation of pretrained weights
Example validation cmd to test w/ non-square resize python validate.py /imagenet --model swin_base_patch4_window7_224.ms_in22k_ft_in1k --amp --amp-dtype bfloat16 --input-size 3 256 320 --model-kwargs window_size=8,10 img_size=256,320

Assets 2

03 Aug 23:55

rwightman

v0.9.5

81089b1

Release v0.9.5

Minor updates and bug fixes. New ResNeXT w/ highest ImageNet eval I'm aware of in the ResNe(X)t family (seresnextaa201d_32x8d.sw_in12k_ft_in1k_384)

Aug 3, 2023

Add GluonCV weights for HRNet w18_small and w18_small_v2. Converted by SeeFun
Fix selecsls* model naming regression
Patch and position embedding for ViT/EVA works for bfloat16/float16 weights on load (or activations for on-the-fly resize)
v0.9.5 release prep

July 27, 2023

Added timm trained seresnextaa201d_32x8d.sw_in12k_ft_in1k_384 weights (and .sw_in12k pretrain) with 87.3% top-1 on ImageNet-1k, best ImageNet ResNet family model I'm aware of.
RepViT model and weights (https://arxiv.org/abs/2307.09283) added by wangao
I-JEPA ViT feature weights (no classifier) added by SeeFun
SAM-ViT (segment anything) feature weights (no classifier) added by SeeFun
Add support for alternative feat extraction methods and -ve indices to EfficientNet
Add NAdamW optimizer
Misc fixes

Assets 2

14 May 15:08

rwightman

v0.9.2

3d05c0e

Release v0.9.2

Fix _hub deprecation pass through import

Assets 2

12 May 16:52

rwightman

v0.9.1

cc77096

Release v0.9.1

The first non pre-release since Oct 2022 with a long list of changes from 0.6.x releases...

May 12, 2023

Fix Python 3.7 import error re Final[] typing annotation

May 11, 2023

timm 0.9 released, transition from 0.8.xdev releases

May 10, 2023

Hugging Face Hub downloading is now default, 1132 models on https://huggingface.co/timm, 1163 weights in timm
DINOv2 vit feature backbone weights added thanks to Leng Yue
FB MAE vit feature backbone weights added
OpenCLIP DataComp-XL L/14 feat backbone weights added
MetaFormer (poolformer-v2, caformer, convformer, updated poolformer (v1)) w/ weights added by Fredo Guan
Experimental get_intermediate_layers function on vit/deit models for grabbing hidden states (inspired by DINO impl). This is WIP and may change significantly... feedback welcome.
Model creation throws error if pretrained=True and no weights exist (instead of continuing with random initialization)
Fix regression with inception / nasnet TF sourced weights with 1001 classes in original classifiers
bitsandbytes (https://github.com/TimDettmers/bitsandbytes) optimizers added to factory, use bnb prefix, ie bnbadam8bit
Misc cleanup and fixes
Final testing before switching to a 0.9 and bringing timm out of pre-release state

April 27, 2023

97% of timm models uploaded to HF Hub and almost all updated to support multi-weight pretrained configs
Minor cleanup and refactoring of another batch of models as multi-weight added. More fused_attn (F.sdpa) and features_only support, and torchscript fixes.

April 21, 2023

Gradient accumulation support added to train script and tested (--grad-accum-steps), thanks Taeksang Kim
More weights on HF Hub (cspnet, cait, volo, xcit, tresnet, hardcorenas, densenet, dpn, vovnet, xception_aligned)
Added --head-init-scale and --head-init-bias to train.py to scale classiifer head and set fixed bias for fine-tune
Remove all InplaceABN (inplace_abn) use, replaced use in tresnet with standard BatchNorm (modified weights accordingly).

April 12, 2023

Add ONNX export script, validate script, helpers that I've had kicking around for along time. Tweak 'same' padding for better export w/ recent ONNX + pytorch.
Refactor dropout args for vit and vit-like models, separate drop_rate into drop_rate (classifier dropout), proj_drop_rate (block mlp / out projections), pos_drop_rate (position embedding drop), attn_drop_rate (attention dropout). Also add patch dropout (FLIP) to vit and eva models.
fused F.scaled_dot_product_attention support to more vit models, add env var (TIMM_FUSED_ATTN) to control, and config interface to enable/disable
Add EVA-CLIP backbones w/ image tower weights, all the way up to 4B param 'enormous' model, and 336x336 OpenAI ViT mode that was missed.

April 5, 2023

ALL ResNet models pushed to Hugging Face Hub with multi-weight support
- All past timm trained weights added with recipe based tags to differentiate
- All ResNet strikes back A1/A2/A3 (seed 0) and R50 example B/C1/C2/D weights available
- Add torchvision v2 recipe weights to existing torchvision originals
- See comparison table in https://huggingface.co/timm/seresnextaa101d_32x8d.sw_in12k_ft_in1k_288#model-comparison
New ImageNet-12k + ImageNet-1k fine-tunes available for a few anti-aliased ResNet models
- resnetaa50d.sw_in12k_ft_in1k - 81.7 @ 224, 82.6 @ 288
- resnetaa101d.sw_in12k_ft_in1k - 83.5 @ 224, 84.1 @ 288
- seresnextaa101d_32x8d.sw_in12k_ft_in1k - 86.0 @ 224, 86.5 @ 288
- seresnextaa101d_32x8d.sw_in12k_ft_in1k_288 - 86.5 @ 288, 86.7 @ 320

March 31, 2023

Add first ConvNext-XXLarge CLIP -> IN-1k fine-tune and IN-12k intermediate fine-tunes for convnext-base/large CLIP models.

model	top1	top5	img_size	param_count	gmacs	macts
convnext_xxlarge.clip_laion2b_soup_ft_in1k	88.612	98.704	256	846.47	198.09	124.45
convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_384	88.312	98.578	384	200.13	101.11	126.74
convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_320	87.968	98.47	320	200.13	70.21	88.02
convnext_base.clip_laion2b_augreg_ft_in12k_in1k_384	87.138	98.212	384	88.59	45.21	84.49
convnext_base.clip_laion2b_augreg_ft_in12k_in1k	86.344	97.97	256	88.59	20.09	37.55

Add EVA-02 MIM pretrained and fine-tuned weights, push to HF hub and update model cards for all EVA models. First model over 90% top-1 (99% top-5)! Check out the original code & weights at https://github.com/baaivision/EVA for more details on their work blending MIM, CLIP w/ many model, dataset, and train recipe tweaks.

model	top1	top5	param_count	img_size
eva02_large_patch14_448.mim_m38m_ft_in22k_in1k	90.054	99.042	305.08	448
eva02_large_patch14_448.mim_in22k_ft_in22k_in1k	89.946	99.01	305.08	448
eva_giant_patch14_560.m30m_ft_in22k_in1k	89.792	98.992	1014.45	560
eva02_large_patch14_448.mim_in22k_ft_in1k	89.626	98.954	305.08	448
eva02_large_patch14_448.mim_m38m_ft_in1k	89.57	98.918	305.08	448
eva_giant_patch14_336.m30m_ft_in22k_in1k	89.56	98.956	1013.01	336
eva_giant_patch14_336.clip_ft_in1k	89.466	98.82	1013.01	336
eva_large_patch14_336.in22k_ft_in22k_in1k	89.214	98.854	304.53	336
eva_giant_patch14_224.clip_ft_in1k	88.882	98.678	1012.56	224
eva02_base_patch14_448.mim_in22k_ft_in22k_in1k	88.692	98.722	87.12	448
eva_large_patch14_336.in22k_ft_in1k	88.652	98.722	304.53	336
eva_large_patch14_196.in22k_ft_in22k_in1k	88.592	98.656	304.14	196
eva02_base_patch14_448.mim_in22k_ft_in1k	88.23	98.564	87.12	448
eva_large_patch14_196.in22k_ft_in1k	87.934	98.504	304.14	196
eva02_small_patch14_336.mim_in22k_ft_in1k	85.74	97.614	22.13	336
eva02_tiny_patch14_336.mim_in22k_ft_in1k	80.658	95.524	5.76	336

Multi-weight and HF hub for DeiT and MLP-Mixer based models

March 22, 2023

More weights pushed to HF hub along with multi-weight support, including: regnet.py, rexnet.py, byobnet.py, resnetv2.py, swin_transformer.py, swin_transformer_v2.py, swin_transformer_v2_cr.py
Swin Transformer models support feature extraction (NCHW feat maps for swinv2_cr_*, and NHWC for all others) and spatial embedding outputs.
FocalNet (from https://github.com/microsoft/FocalNet) models and weights added with significant refactoring, feature extraction, no fixed resolution / sizing constraint
RegNet weights increased with HF hub push, SWAG, SEER, and torchvision v2 weights. SEER is pretty poor wrt to performance for model size, but possibly useful.
More ImageNet-12k pretrained and 1k fine-tuned timm weights:
- rexnetr_200.sw_in12k_ft_in1k - 82.6 @ 224, 83.2 @ 288
- rexnetr_300.sw_in12k_ft_in1k - 84.0 @ 224, 84.5 @ 288
- regnety_120.sw_in12k_ft_in1k - 85.0 @ 224, 85.4 @ 288
- regnety_160.lion_in12k_ft_in1k - 85.6 @ 224, 86.0 @ 288
- regnety_160.sw_in12k_ft_in1k - 85.6 @ 224, 86.0 @ 288 (compare to SWAG PT + 1k FT this is same BUT much lower res, blows SEER FT away)
Model name deprecation + remapping functionality added (a milestone for bringing 0.8.x out of pre-release). Mappings being added...
Minor bug fixes and improvements.

Feb 26, 2023

Add ConvNeXt-XXLarge CLIP pretrained image tower weights for fine-tune & features (fine-tuning TBD) -- see model card
Update convnext_xxlarge default LayerNorm eps to 1e-5 (for CLIP weights, improved stability)
0.8.15dev0

Feb 20, 2023

Add 320x320 convnext_large_mlp.clip_laion2b_ft_320 and convnext_lage_mlp.clip_laion2b_ft_soup_320 CLIP image tower weights for features & fine-tune
0.8.13dev0 pypi release for latest changes w/ move to huggingface org

Feb 16, 2023

safetensor checkpoint support added
Add ideas from 'Scaling Vision Transformers to 22 B. Params' (https://arxiv.org/abs/2302.05442) -- qk norm, RmsNorm, parallel block
Add F.scaled_dot_product_attention support (PyTorch 2.0 only) to vit_*, vit_relpos*, coatnet / maxxvit (to start)
Lion optimizer (w/ multi-tensor option) added (https://arxiv.org/abs/2302.06675)
gradient checkpointing works with features_only=True

Feb 7, 2023

New inference benchmark numbers added in results folder.
Add convnext LAION CLIP trained weights and initial set of in1k fine-tunes
- convnext_base.clip_laion2b_augreg_ft_in1k - 86.2% @ 256x256
- convnext_base.clip_laiona_augreg_ft_in1k_384 - 86.5% @ 384x384
- convnext_large_mlp.clip_laion2b_augreg_ft_in1k - 87.3% @ 256x256
- `co...

Assets 2

12 May 15:34

rwightman

v0.9.0

35b9fc7

Release v0.9.0

First non pre-release in a loooong while, changelog from 0.6.x below...

May 11, 2023

timm 0.9 released, transition from 0.8.xdev releases

May 10, 2023

Hugging Face Hub downloading is now default, 1132 models on https://huggingface.co/timm, 1163 weights in timm
DINOv2 vit feature backbone weights added thanks to Leng Yue
FB MAE vit feature backbone weights added
OpenCLIP DataComp-XL L/14 feat backbone weights added
MetaFormer (poolformer-v2, caformer, convformer, updated poolformer (v1)) w/ weights added by Fredo Guan
Experimental get_intermediate_layers function on vit/deit models for grabbing hidden states (inspired by DINO impl). This is WIP and may change significantly... feedback welcome.
Model creation throws error if pretrained=True and no weights exist (instead of continuing with random initialization)
Fix regression with inception / nasnet TF sourced weights with 1001 classes in original classifiers
bitsandbytes (https://github.com/TimDettmers/bitsandbytes) optimizers added to factory, use bnb prefix, ie bnbadam8bit
Misc cleanup and fixes
Final testing before switching to a 0.9 and bringing timm out of pre-release state

April 27, 2023

97% of timm models uploaded to HF Hub and almost all updated to support multi-weight pretrained configs
Minor cleanup and refactoring of another batch of models as multi-weight added. More fused_attn (F.sdpa) and features_only support, and torchscript fixes.

April 21, 2023

Gradient accumulation support added to train script and tested (--grad-accum-steps), thanks Taeksang Kim
More weights on HF Hub (cspnet, cait, volo, xcit, tresnet, hardcorenas, densenet, dpn, vovnet, xception_aligned)
Added --head-init-scale and --head-init-bias to train.py to scale classiifer head and set fixed bias for fine-tune
Remove all InplaceABN (inplace_abn) use, replaced use in tresnet with standard BatchNorm (modified weights accordingly).

April 12, 2023

Add ONNX export script, validate script, helpers that I've had kicking around for along time. Tweak 'same' padding for better export w/ recent ONNX + pytorch.
Refactor dropout args for vit and vit-like models, separate drop_rate into drop_rate (classifier dropout), proj_drop_rate (block mlp / out projections), pos_drop_rate (position embedding drop), attn_drop_rate (attention dropout). Also add patch dropout (FLIP) to vit and eva models.
fused F.scaled_dot_product_attention support to more vit models, add env var (TIMM_FUSED_ATTN) to control, and config interface to enable/disable
Add EVA-CLIP backbones w/ image tower weights, all the way up to 4B param 'enormous' model, and 336x336 OpenAI ViT mode that was missed.

April 5, 2023

ALL ResNet models pushed to Hugging Face Hub with multi-weight support
- All past timm trained weights added with recipe based tags to differentiate
- All ResNet strikes back A1/A2/A3 (seed 0) and R50 example B/C1/C2/D weights available
- Add torchvision v2 recipe weights to existing torchvision originals
- See comparison table in https://huggingface.co/timm/seresnextaa101d_32x8d.sw_in12k_ft_in1k_288#model-comparison
New ImageNet-12k + ImageNet-1k fine-tunes available for a few anti-aliased ResNet models
- resnetaa50d.sw_in12k_ft_in1k - 81.7 @ 224, 82.6 @ 288
- resnetaa101d.sw_in12k_ft_in1k - 83.5 @ 224, 84.1 @ 288
- seresnextaa101d_32x8d.sw_in12k_ft_in1k - 86.0 @ 224, 86.5 @ 288
- seresnextaa101d_32x8d.sw_in12k_ft_in1k_288 - 86.5 @ 288, 86.7 @ 320

March 31, 2023

Add first ConvNext-XXLarge CLIP -> IN-1k fine-tune and IN-12k intermediate fine-tunes for convnext-base/large CLIP models.

model	top1	top5	img_size	param_count	gmacs	macts
convnext_xxlarge.clip_laion2b_soup_ft_in1k	88.612	98.704	256	846.47	198.09	124.45
convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_384	88.312	98.578	384	200.13	101.11	126.74
convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_320	87.968	98.47	320	200.13	70.21	88.02
convnext_base.clip_laion2b_augreg_ft_in12k_in1k_384	87.138	98.212	384	88.59	45.21	84.49
convnext_base.clip_laion2b_augreg_ft_in12k_in1k	86.344	97.97	256	88.59	20.09	37.55

Add EVA-02 MIM pretrained and fine-tuned weights, push to HF hub and update model cards for all EVA models. First model over 90% top-1 (99% top-5)! Check out the original code & weights at https://github.com/baaivision/EVA for more details on their work blending MIM, CLIP w/ many model, dataset, and train recipe tweaks.

model	top1	top5	param_count	img_size
eva02_large_patch14_448.mim_m38m_ft_in22k_in1k	90.054	99.042	305.08	448
eva02_large_patch14_448.mim_in22k_ft_in22k_in1k	89.946	99.01	305.08	448
eva_giant_patch14_560.m30m_ft_in22k_in1k	89.792	98.992	1014.45	560
eva02_large_patch14_448.mim_in22k_ft_in1k	89.626	98.954	305.08	448
eva02_large_patch14_448.mim_m38m_ft_in1k	89.57	98.918	305.08	448
eva_giant_patch14_336.m30m_ft_in22k_in1k	89.56	98.956	1013.01	336
eva_giant_patch14_336.clip_ft_in1k	89.466	98.82	1013.01	336
eva_large_patch14_336.in22k_ft_in22k_in1k	89.214	98.854	304.53	336
eva_giant_patch14_224.clip_ft_in1k	88.882	98.678	1012.56	224
eva02_base_patch14_448.mim_in22k_ft_in22k_in1k	88.692	98.722	87.12	448
eva_large_patch14_336.in22k_ft_in1k	88.652	98.722	304.53	336
eva_large_patch14_196.in22k_ft_in22k_in1k	88.592	98.656	304.14	196
eva02_base_patch14_448.mim_in22k_ft_in1k	88.23	98.564	87.12	448
eva_large_patch14_196.in22k_ft_in1k	87.934	98.504	304.14	196
eva02_small_patch14_336.mim_in22k_ft_in1k	85.74	97.614	22.13	336
eva02_tiny_patch14_336.mim_in22k_ft_in1k	80.658	95.524	5.76	336

Multi-weight and HF hub for DeiT and MLP-Mixer based models

March 22, 2023

More weights pushed to HF hub along with multi-weight support, including: regnet.py, rexnet.py, byobnet.py, resnetv2.py, swin_transformer.py, swin_transformer_v2.py, swin_transformer_v2_cr.py
Swin Transformer models support feature extraction (NCHW feat maps for swinv2_cr_*, and NHWC for all others) and spatial embedding outputs.
FocalNet (from https://github.com/microsoft/FocalNet) models and weights added with significant refactoring, feature extraction, no fixed resolution / sizing constraint
RegNet weights increased with HF hub push, SWAG, SEER, and torchvision v2 weights. SEER is pretty poor wrt to performance for model size, but possibly useful.
More ImageNet-12k pretrained and 1k fine-tuned timm weights:
- rexnetr_200.sw_in12k_ft_in1k - 82.6 @ 224, 83.2 @ 288
- rexnetr_300.sw_in12k_ft_in1k - 84.0 @ 224, 84.5 @ 288
- regnety_120.sw_in12k_ft_in1k - 85.0 @ 224, 85.4 @ 288
- regnety_160.lion_in12k_ft_in1k - 85.6 @ 224, 86.0 @ 288
- regnety_160.sw_in12k_ft_in1k - 85.6 @ 224, 86.0 @ 288 (compare to SWAG PT + 1k FT this is same BUT much lower res, blows SEER FT away)
Model name deprecation + remapping functionality added (a milestone for bringing 0.8.x out of pre-release). Mappings being added...
Minor bug fixes and improvements.

Feb 26, 2023

Add ConvNeXt-XXLarge CLIP pretrained image tower weights for fine-tune & features (fine-tuning TBD) -- see model card
Update convnext_xxlarge default LayerNorm eps to 1e-5 (for CLIP weights, improved stability)
0.8.15dev0

Feb 20, 2023

Add 320x320 convnext_large_mlp.clip_laion2b_ft_320 and convnext_lage_mlp.clip_laion2b_ft_soup_320 CLIP image tower weights for features & fine-tune
0.8.13dev0 pypi release for latest changes w/ move to huggingface org

Feb 16, 2023

safetensor checkpoint support added
Add ideas from 'Scaling Vision Transformers to 22 B. Params' (https://arxiv.org/abs/2302.05442) -- qk norm, RmsNorm, parallel block
Add F.scaled_dot_product_attention support (PyTorch 2.0 only) to vit_*, vit_relpos*, coatnet / maxxvit (to start)
Lion optimizer (w/ multi-tensor option) added (https://arxiv.org/abs/2302.06675)
gradient checkpointing works with features_only=True

Feb 7, 2023

New inference benchmark numbers added in results folder.
Add convnext LAION CLIP trained weights and initial set of in1k fine-tunes
- convnext_base.clip_laion2b_augreg_ft_in1k - 86.2% @ 256x256
- convnext_base.clip_laiona_augreg_ft_in1k_384 - 86.5% @ 384x384
- convnext_large_mlp.clip_laion2b_augreg_ft_in1k - 87.3% @ 256x256
- convnext_large_mlp.clip_laion2b_augreg_ft_in1k_384 - 87.9% @ 384x384
Add DaViT models. Supports ...

Assets 2

16 Apr 15:27

rwightman

v0.6.13

2696eed

Release v0.6.13

Release from 0.6.x stable branch with fix for Python 3.11. NOTE original 0.6.13 release tag was against wrong branch.

Assets 2

24 Mar 00:59

rwightman

v0.8.17dev0

a089bfb

Release v0.8.17dev0 Pre-release

Pre-release

March 22, 2023

More weights pushed to HF hub along with multi-weight support, including: regnet.py, rexnet.py, byobnet.py, resnetv2.py, swin_transformer.py, swin_transformer_v2.py, swin_transformer_v2_cr.py
Swin Transformer models support feature extraction (NCHW feat maps for swinv2_cr_*, and NHWC for all others) and spatial embedding outputs.
FocalNet (from https://github.com/microsoft/FocalNet) models and weights added with significant refactoring, feature extraction, no fixed resolution / sizing constraint
RegNet weights increased with HF hub push, SWAG, SEER, and torchvision v2 weights. SEER is pretty poor wrt to performance for model size, but possibly useful.
More ImageNet-12k pretrained and 1k fine-tuned timm weights:
- rexnetr_200.sw_in12k_ft_in1k - 82.6 @ 224, 83.2 @ 288
- rexnetr_300.sw_in12k_ft_in1k - 84.0 @ 224, 84.5 @ 288
- regnety_120.sw_in12k_ft_in1k - 85.0 @ 224, 85.4 @ 288
- regnety_160.lion_in12k_ft_in1k - 85.6 @ 224, 86.0 @ 288
- regnety_160.sw_in12k_ft_in1k - 85.6 @ 224, 86.0 @ 288 (compare to SWAG PT + 1k FT this is same BUT much lower res, blows SEER FT away)
Model name deprecation + remapping functionality added (a milestone for bringing 0.8.x out of pre-release). Mappings being added...
Minor bug fixes and improvements.

Feb 26, 2023

Add ConvNeXt-XXLarge CLIP pretrained image tower weights for fine-tune & features (fine-tuning TBD) -- see model card
Update convnext_xxlarge default LayerNorm eps to 1e-5 (for CLIP weights, improved stability)
0.8.15dev0

Assets 2

20 Feb 18:26

rwightman

v0.8.13dev0

a0772f0

v0.8.13dev0 Release Pre-release

Pre-release

Feb 20, 2023

Add 320x320 convnext_large_mlp.clip_laion2b_ft_320 and convnext_lage_mlp.clip_laion2b_ft_soup_320 CLIP image tower weights for features & fine-tune
0.8.13dev0 pypi release for latest changes w/ move to huggingface org

Feb 16, 2023

safetensor checkpoint support added
Add ideas from 'Scaling Vision Transformers to 22 B. Params' (https://arxiv.org/abs/2302.05442) -- qk norm, RmsNorm, parallel block
Add F.scaled_dot_product_attention support (PyTorch 2.0 only) to vit_*, vit_relpos*, coatnet / maxxvit (to start)
Lion optimizer (w/ multi-tensor option) added (https://arxiv.org/abs/2302.06675)

Assets 2

07 Feb 22:37

rwightman

v0.8.10dev0

1e0b347

v0.8.10dev0 Release Pre-release

Pre-release

Feb 7, 2023

New inference benchmark numbers added in results folder.
Add convnext LAION CLIP trained weights and initial set of in1k fine-tunes
- convnext_base.clip_laion2b_augreg_ft_in1k - 86.2% @ 256x256
- convnext_base.clip_laiona_augreg_ft_in1k_384 - 86.5% @ 384x384
- convnext_large_mlp.clip_laion2b_augreg_ft_in1k - 87.3% @ 256x256
- convnext_large_mlp.clip_laion2b_augreg_ft_in1k_384 - 87.9% @ 384x384
Add DaViT models. Supports features_only=True. Adapted from https://github.com/dingmyu/davit by Fredo.
Use a common NormMlpClassifierHead across MaxViT, ConvNeXt, DaViT
Add EfficientFormer-V2 model, update EfficientFormer, and refactor LeViT (closely related architectures). Weights on HF hub.
- New EfficientFormer-V2 arch, significant refactor from original at (https://github.com/snap-research/EfficientFormer). Supports features_only=True.
- Minor updates to EfficientFormer.
- Refactor LeViT models to stages, add features_only=True support to new conv variants, weight remap required.
Move ImageNet meta-data (synsets, indices) from /results to timm/data/_info.
Add ImageNetInfo / DatasetInfo classes to provide labelling for various ImageNet classifier layouts in timm
- Update inference.py to use, try: python inference.py /folder/to/images --model convnext_small.in12k --label-type detail --topk 5
Ready for 0.8.10 pypi pre-release (final testing).

Jan 20, 2023

Add two convnext 12k -> 1k fine-tunes at 384x384
- convnext_tiny.in12k_ft_in1k_384 - 85.1 @ 384
- convnext_small.in12k_ft_in1k_384 - 86.2 @ 384
Push all MaxxViT weights to HF hub, and add new ImageNet-12k -> 1k fine-tunes for rw base MaxViT and CoAtNet 1/2 models

model	top1	top5	samples / sec	Params (M)	GMAC	Act (M)
maxvit_xlarge_tf_512.in21k_ft_in1k	88.53	98.64	21.76	475.77	534.14	1413.22
maxvit_xlarge_tf_384.in21k_ft_in1k	88.32	98.54	42.53	475.32	292.78	668.76
maxvit_base_tf_512.in21k_ft_in1k	88.20	98.53	50.87	119.88	138.02	703.99
maxvit_large_tf_512.in21k_ft_in1k	88.04	98.40	36.42	212.33	244.75	942.15
maxvit_large_tf_384.in21k_ft_in1k	87.98	98.56	71.75	212.03	132.55	445.84
maxvit_base_tf_384.in21k_ft_in1k	87.92	98.54	104.71	119.65	73.80	332.90
maxvit_rmlp_base_rw_384.sw_in12k_ft_in1k	87.81	98.37	106.55	116.14	70.97	318.95
maxxvitv2_rmlp_base_rw_384.sw_in12k_ft_in1k	87.47	98.37	149.49	116.09	72.98	213.74
coatnet_rmlp_2_rw_384.sw_in12k_ft_in1k	87.39	98.31	160.80	73.88	47.69	209.43
maxvit_rmlp_base_rw_224.sw_in12k_ft_in1k	86.89	98.02	375.86	116.14	23.15	92.64
maxxvitv2_rmlp_base_rw_224.sw_in12k_ft_in1k	86.64	98.02	501.03	116.09	24.20	62.77
maxvit_base_tf_512.in1k	86.60	97.92	50.75	119.88	138.02	703.99
coatnet_2_rw_224.sw_in12k_ft_in1k	86.57	97.89	631.88	73.87	15.09	49.22
maxvit_large_tf_512.in1k	86.52	97.88	36.04	212.33	244.75	942.15
coatnet_rmlp_2_rw_224.sw_in12k_ft_in1k	86.49	97.90	620.58	73.88	15.18	54.78
maxvit_base_tf_384.in1k	86.29	97.80	101.09	119.65	73.80	332.90
maxvit_large_tf_384.in1k	86.23	97.69	70.56	212.03	132.55	445.84
maxvit_small_tf_512.in1k	86.10	97.76	88.63	69.13	67.26	383.77
maxvit_tiny_tf_512.in1k	85.67	97.58	144.25	31.05	33.49	257.59
maxvit_small_tf_384.in1k	85.54	97.46	188.35	69.02	35.87	183.65
maxvit_tiny_tf_384.in1k	85.11	97.38	293.46	30.98	17.53	123.42
maxvit_large_tf_224.in1k	84.93	96.97	247.71	211.79	43.68	127.35
coatnet_rmlp_1_rw2_224.sw_in12k_ft_in1k	84.90	96.96	1025.45	41.72	8.11	40.13
maxvit_base_tf_224.in1k	84.85	96.99	358.25	119.47	24.04	95.01
maxxvit_rmlp_small_rw_256.sw_in1k	84.63	97.06	575.53	66.01	14.67	58.38
coatnet_rmlp_2_rw_224.sw_in1k	84.61	96.74	625.81	73.88	15.18	54.78
maxvit_rmlp_small_rw_224.sw_in1k	84.49	96.76	693.82	64.90	10.75	49.30
maxvit_small_tf_224.in1k	84.43	96.83	647.96	68.93	11.66	53.17
maxvit_rmlp_tiny_rw_256.sw_in1k	84.23	96.78	807.21	29.15	6.77	46.92
coatnet_1_rw_224.sw_in1k	83.62	96.38	989.59	41.72	8.04	34.60
maxvit_tiny_rw_224.sw_in1k	83.50	96.50	1100.53	29.06	5.11	33.11
maxvit_tiny_tf_224.in1k	83.41	96.59	1004.94	30.92	5.60	35.78
coatnet_rmlp_1_rw_224.sw_in1k	83.36	96.45	1093.03	41.69	7.85	35.47
maxxvitv2_nano_rw_256.sw_in1k	83.11	96.33	1276.88	23.70	6.26	23.05
maxxvit_rmlp_nano_rw_256.sw_in1k	83.03	96.34	1341.24	16.78	4.37	26.05
maxvit_rmlp_nano_rw_256.sw_in1k	82.96	96.26	1283.24	15.50	4.47	31.92
maxvit_nano_rw_256.sw_in1k	82.93	96.23	1218.17	15.45	4.46	30.28
coatnet_bn_0_rw_224.sw_in1k	82.39	96.19	1600.14	27.44	4.67	22.04
coatnet_0_rw_224.sw_in1k	82.39	95.84	1831.21	27.44	4.43	18.73
coatnet_rmlp_nano_rw_224.sw_in1k	82.05	95.87	2109.09	15.15	2.62	20.34
coatnext_nano_rw_224.sw_in1k	81.95	95.92	2525.52	14.70	2.47	12.80
coatnet_nano_rw_224.sw_in1k	81.70	95.64	2344.52	15.14	2.41	15.41
maxvit_rmlp_pico_rw_256.sw_in1k	80....

Assets 2

12 Jan 05:36

rwightman

v0.8.6dev0

a2c14c2

v0.8.6dev0 Release Pre-release

Pre-release

Jan 11, 2023

Update ConvNeXt ImageNet-12k pretrain series w/ two new fine-tuned weights (and pre FT .in12k tags)
- convnext_nano.in12k_ft_in1k - 82.3 @ 224, 82.9 @ 288 (previously released)
- convnext_tiny.in12k_ft_in1k - 84.2 @ 224, 84.5 @ 288
- convnext_small.in12k_ft_in1k - 85.2 @ 224, 85.3 @ 288

Jan 6, 2023

Finally got around to adding --model-kwargs and --opt-kwargs to scripts to pass through rare args directly to model classes from cmd line
- train.py /imagenet --model resnet50 --amp --model-kwargs output_stride=16 act_layer=silu
- train.py /imagenet --model vit_base_patch16_clip_224 --img-size 240 --amp --model-kwargs img_size=240 patch_size=12
Cleanup some popular models to better support arg passthrough / merge with model configs, more to go.

Jan 5, 2023

ConvNeXt-V2 models and weights added to existing convnext.py
- Paper: ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders
- Reference impl: https://github.com/facebookresearch/ConvNeXt-V2 (NOTE: weights currently CC-BY-NC)

Assets 2

Releases: huggingface/pytorch-image-models

Release v0.9.6

Aug 28, 2023

Aug 25, 2023

Aug 11, 2023

Release v0.9.5

Aug 3, 2023

July 27, 2023

Release v0.9.2

Release v0.9.1

May 12, 2023

May 11, 2023

May 10, 2023

April 27, 2023

April 21, 2023

April 12, 2023

April 5, 2023

March 31, 2023

March 22, 2023

Feb 26, 2023

Feb 20, 2023

Feb 16, 2023

Feb 7, 2023

Release v0.9.0

May 11, 2023

May 10, 2023

April 27, 2023

April 21, 2023

April 12, 2023

April 5, 2023

March 31, 2023

March 22, 2023

Feb 26, 2023

Feb 20, 2023

Feb 16, 2023

Feb 7, 2023

Release v0.6.13

Release v0.8.17dev0

March 22, 2023

Feb 26, 2023

v0.8.13dev0 Release

Feb 20, 2023

Feb 16, 2023

v0.8.10dev0 Release

Feb 7, 2023

Jan 20, 2023

v0.8.6dev0 Release

Jan 11, 2023

Jan 6, 2023

Jan 5, 2023