Releases: huggingface/pytorch-image-models
Releases · huggingface/pytorch-image-models
Release v0.9.6
Aug 28, 2023
- Add dynamic img size support to models in
vision_transformer.py
,vision_transformer_hybrid.py
,deit.py
, andeva.py
w/o breaking backward compat.- Add
dynamic_img_size=True
to args at model creation time to allow changing the grid size (interpolate abs and/or ROPE pos embed each forward pass). - Add
dynamic_img_pad=True
to allow image sizes that aren't divisible by patch size (pad bottom right to patch size each forward pass). - Enabling either dynamic mode will break FX tracing unless PatchEmbed module added as leaf.
- Existing method of resizing position embedding by passing different
img_size
(interpolate pretrained embed weights once) on creation still works. - Existing method of changing
patch_size
(resize pretrained patch_embed weights once) on creation still works. - Example validation cmd
python validate.py /imagenet --model vit_base_patch16_224 --amp --amp-dtype bfloat16 --img-size 255 --crop-pct 1.0 --model-kwargs dynamic_img_size=True dyamic_img_pad=True
- Add
Aug 25, 2023
- Many new models since last release
- FastViT - https://arxiv.org/abs/2303.14189
- MobileOne - https://arxiv.org/abs/2206.04040
- InceptionNeXt - https://arxiv.org/abs/2303.16900
- RepGhostNet - https://arxiv.org/abs/2211.06088 (thanks https://github.com/ChengpengChen)
- GhostNetV2 - https://arxiv.org/abs/2211.12905 (thanks https://github.com/yehuitang)
- EfficientViT (MSRA) - https://arxiv.org/abs/2305.07027 (thanks https://github.com/seefun)
- EfficientViT (MIT) - https://arxiv.org/abs/2205.14756 (thanks https://github.com/seefun)
- Add
--reparam
arg tobenchmark.py
,onnx_export.py
, andvalidate.py
to trigger layer reparameterization / fusion for models with any one ofreparameterize()
,switch_to_deploy()
orfuse()
- Including FastViT, MobileOne, RepGhostNet, EfficientViT (MSRA), RepViT, RepVGG, and LeViT
- Preparing 0.9.6 'back to school' release
Aug 11, 2023
- Swin, MaxViT, CoAtNet, and BEiT models support resizing of image/window size on creation with adaptation of pretrained weights
- Example validation cmd to test w/ non-square resize
python validate.py /imagenet --model swin_base_patch4_window7_224.ms_in22k_ft_in1k --amp --amp-dtype bfloat16 --input-size 3 256 320 --model-kwargs window_size=8,10 img_size=256,320
Release v0.9.5
Minor updates and bug fixes. New ResNeXT w/ highest ImageNet eval I'm aware of in the ResNe(X)t family (seresnextaa201d_32x8d.sw_in12k_ft_in1k_384
)
Aug 3, 2023
- Add GluonCV weights for HRNet w18_small and w18_small_v2. Converted by SeeFun
- Fix
selecsls*
model naming regression - Patch and position embedding for ViT/EVA works for bfloat16/float16 weights on load (or activations for on-the-fly resize)
- v0.9.5 release prep
July 27, 2023
- Added timm trained
seresnextaa201d_32x8d.sw_in12k_ft_in1k_384
weights (and.sw_in12k
pretrain) with 87.3% top-1 on ImageNet-1k, best ImageNet ResNet family model I'm aware of. - RepViT model and weights (https://arxiv.org/abs/2307.09283) added by wangao
- I-JEPA ViT feature weights (no classifier) added by SeeFun
- SAM-ViT (segment anything) feature weights (no classifier) added by SeeFun
- Add support for alternative feat extraction methods and -ve indices to EfficientNet
- Add NAdamW optimizer
- Misc fixes
Release v0.9.2
- Fix _hub deprecation pass through import
Release v0.9.1
The first non pre-release since Oct 2022 with a long list of changes from 0.6.x releases...
May 12, 2023
- Fix Python 3.7 import error re Final[] typing annotation
May 11, 2023
timm
0.9 released, transition from 0.8.xdev releases
May 10, 2023
- Hugging Face Hub downloading is now default, 1132 models on https://huggingface.co/timm, 1163 weights in
timm
- DINOv2 vit feature backbone weights added thanks to Leng Yue
- FB MAE vit feature backbone weights added
- OpenCLIP DataComp-XL L/14 feat backbone weights added
- MetaFormer (poolformer-v2, caformer, convformer, updated poolformer (v1)) w/ weights added by Fredo Guan
- Experimental
get_intermediate_layers
function on vit/deit models for grabbing hidden states (inspired by DINO impl). This is WIP and may change significantly... feedback welcome. - Model creation throws error if
pretrained=True
and no weights exist (instead of continuing with random initialization) - Fix regression with inception / nasnet TF sourced weights with 1001 classes in original classifiers
- bitsandbytes (https://github.com/TimDettmers/bitsandbytes) optimizers added to factory, use
bnb
prefix, iebnbadam8bit
- Misc cleanup and fixes
- Final testing before switching to a 0.9 and bringing
timm
out of pre-release state
April 27, 2023
- 97% of
timm
models uploaded to HF Hub and almost all updated to support multi-weight pretrained configs - Minor cleanup and refactoring of another batch of models as multi-weight added. More fused_attn (F.sdpa) and features_only support, and torchscript fixes.
April 21, 2023
- Gradient accumulation support added to train script and tested (
--grad-accum-steps
), thanks Taeksang Kim - More weights on HF Hub (cspnet, cait, volo, xcit, tresnet, hardcorenas, densenet, dpn, vovnet, xception_aligned)
- Added
--head-init-scale
and--head-init-bias
to train.py to scale classiifer head and set fixed bias for fine-tune - Remove all InplaceABN (
inplace_abn
) use, replaced use in tresnet with standard BatchNorm (modified weights accordingly).
April 12, 2023
- Add ONNX export script, validate script, helpers that I've had kicking around for along time. Tweak 'same' padding for better export w/ recent ONNX + pytorch.
- Refactor dropout args for vit and vit-like models, separate drop_rate into
drop_rate
(classifier dropout),proj_drop_rate
(block mlp / out projections),pos_drop_rate
(position embedding drop),attn_drop_rate
(attention dropout). Also add patch dropout (FLIP) to vit and eva models. - fused F.scaled_dot_product_attention support to more vit models, add env var (TIMM_FUSED_ATTN) to control, and config interface to enable/disable
- Add EVA-CLIP backbones w/ image tower weights, all the way up to 4B param 'enormous' model, and 336x336 OpenAI ViT mode that was missed.
April 5, 2023
- ALL ResNet models pushed to Hugging Face Hub with multi-weight support
- All past
timm
trained weights added with recipe based tags to differentiate - All ResNet strikes back A1/A2/A3 (seed 0) and R50 example B/C1/C2/D weights available
- Add torchvision v2 recipe weights to existing torchvision originals
- See comparison table in https://huggingface.co/timm/seresnextaa101d_32x8d.sw_in12k_ft_in1k_288#model-comparison
- All past
- New ImageNet-12k + ImageNet-1k fine-tunes available for a few anti-aliased ResNet models
resnetaa50d.sw_in12k_ft_in1k
- 81.7 @ 224, 82.6 @ 288resnetaa101d.sw_in12k_ft_in1k
- 83.5 @ 224, 84.1 @ 288seresnextaa101d_32x8d.sw_in12k_ft_in1k
- 86.0 @ 224, 86.5 @ 288seresnextaa101d_32x8d.sw_in12k_ft_in1k_288
- 86.5 @ 288, 86.7 @ 320
March 31, 2023
- Add first ConvNext-XXLarge CLIP -> IN-1k fine-tune and IN-12k intermediate fine-tunes for convnext-base/large CLIP models.
model | top1 | top5 | img_size | param_count | gmacs | macts |
---|---|---|---|---|---|---|
convnext_xxlarge.clip_laion2b_soup_ft_in1k | 88.612 | 98.704 | 256 | 846.47 | 198.09 | 124.45 |
convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_384 | 88.312 | 98.578 | 384 | 200.13 | 101.11 | 126.74 |
convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_320 | 87.968 | 98.47 | 320 | 200.13 | 70.21 | 88.02 |
convnext_base.clip_laion2b_augreg_ft_in12k_in1k_384 | 87.138 | 98.212 | 384 | 88.59 | 45.21 | 84.49 |
convnext_base.clip_laion2b_augreg_ft_in12k_in1k | 86.344 | 97.97 | 256 | 88.59 | 20.09 | 37.55 |
- Add EVA-02 MIM pretrained and fine-tuned weights, push to HF hub and update model cards for all EVA models. First model over 90% top-1 (99% top-5)! Check out the original code & weights at https://github.com/baaivision/EVA for more details on their work blending MIM, CLIP w/ many model, dataset, and train recipe tweaks.
model | top1 | top5 | param_count | img_size |
---|---|---|---|---|
eva02_large_patch14_448.mim_m38m_ft_in22k_in1k | 90.054 | 99.042 | 305.08 | 448 |
eva02_large_patch14_448.mim_in22k_ft_in22k_in1k | 89.946 | 99.01 | 305.08 | 448 |
eva_giant_patch14_560.m30m_ft_in22k_in1k | 89.792 | 98.992 | 1014.45 | 560 |
eva02_large_patch14_448.mim_in22k_ft_in1k | 89.626 | 98.954 | 305.08 | 448 |
eva02_large_patch14_448.mim_m38m_ft_in1k | 89.57 | 98.918 | 305.08 | 448 |
eva_giant_patch14_336.m30m_ft_in22k_in1k | 89.56 | 98.956 | 1013.01 | 336 |
eva_giant_patch14_336.clip_ft_in1k | 89.466 | 98.82 | 1013.01 | 336 |
eva_large_patch14_336.in22k_ft_in22k_in1k | 89.214 | 98.854 | 304.53 | 336 |
eva_giant_patch14_224.clip_ft_in1k | 88.882 | 98.678 | 1012.56 | 224 |
eva02_base_patch14_448.mim_in22k_ft_in22k_in1k | 88.692 | 98.722 | 87.12 | 448 |
eva_large_patch14_336.in22k_ft_in1k | 88.652 | 98.722 | 304.53 | 336 |
eva_large_patch14_196.in22k_ft_in22k_in1k | 88.592 | 98.656 | 304.14 | 196 |
eva02_base_patch14_448.mim_in22k_ft_in1k | 88.23 | 98.564 | 87.12 | 448 |
eva_large_patch14_196.in22k_ft_in1k | 87.934 | 98.504 | 304.14 | 196 |
eva02_small_patch14_336.mim_in22k_ft_in1k | 85.74 | 97.614 | 22.13 | 336 |
eva02_tiny_patch14_336.mim_in22k_ft_in1k | 80.658 | 95.524 | 5.76 | 336 |
- Multi-weight and HF hub for DeiT and MLP-Mixer based models
March 22, 2023
- More weights pushed to HF hub along with multi-weight support, including:
regnet.py
,rexnet.py
,byobnet.py
,resnetv2.py
,swin_transformer.py
,swin_transformer_v2.py
,swin_transformer_v2_cr.py
- Swin Transformer models support feature extraction (NCHW feat maps for
swinv2_cr_*
, and NHWC for all others) and spatial embedding outputs. - FocalNet (from https://github.com/microsoft/FocalNet) models and weights added with significant refactoring, feature extraction, no fixed resolution / sizing constraint
- RegNet weights increased with HF hub push, SWAG, SEER, and torchvision v2 weights. SEER is pretty poor wrt to performance for model size, but possibly useful.
- More ImageNet-12k pretrained and 1k fine-tuned
timm
weights:rexnetr_200.sw_in12k_ft_in1k
- 82.6 @ 224, 83.2 @ 288rexnetr_300.sw_in12k_ft_in1k
- 84.0 @ 224, 84.5 @ 288regnety_120.sw_in12k_ft_in1k
- 85.0 @ 224, 85.4 @ 288regnety_160.lion_in12k_ft_in1k
- 85.6 @ 224, 86.0 @ 288regnety_160.sw_in12k_ft_in1k
- 85.6 @ 224, 86.0 @ 288 (compare to SWAG PT + 1k FT this is same BUT much lower res, blows SEER FT away)
- Model name deprecation + remapping functionality added (a milestone for bringing 0.8.x out of pre-release). Mappings being added...
- Minor bug fixes and improvements.
Feb 26, 2023
- Add ConvNeXt-XXLarge CLIP pretrained image tower weights for fine-tune & features (fine-tuning TBD) -- see model card
- Update
convnext_xxlarge
default LayerNorm eps to 1e-5 (for CLIP weights, improved stability) - 0.8.15dev0
Feb 20, 2023
- Add 320x320
convnext_large_mlp.clip_laion2b_ft_320
andconvnext_lage_mlp.clip_laion2b_ft_soup_320
CLIP image tower weights for features & fine-tune - 0.8.13dev0 pypi release for latest changes w/ move to huggingface org
Feb 16, 2023
safetensor
checkpoint support added- Add ideas from 'Scaling Vision Transformers to 22 B. Params' (https://arxiv.org/abs/2302.05442) -- qk norm, RmsNorm, parallel block
- Add F.scaled_dot_product_attention support (PyTorch 2.0 only) to
vit_*
,vit_relpos*
,coatnet
/maxxvit
(to start) - Lion optimizer (w/ multi-tensor option) added (https://arxiv.org/abs/2302.06675)
- gradient checkpointing works with
features_only=True
Feb 7, 2023
- New inference benchmark numbers added in results folder.
- Add convnext LAION CLIP trained weights and initial set of in1k fine-tunes
convnext_base.clip_laion2b_augreg_ft_in1k
- 86.2% @ 256x256convnext_base.clip_laiona_augreg_ft_in1k_384
- 86.5% @ 384x384convnext_large_mlp.clip_laion2b_augreg_ft_in1k
- 87.3% @ 256x256- `co...
Release v0.9.0
First non pre-release in a loooong while, changelog from 0.6.x below...
May 11, 2023
timm
0.9 released, transition from 0.8.xdev releases
May 10, 2023
- Hugging Face Hub downloading is now default, 1132 models on https://huggingface.co/timm, 1163 weights in
timm
- DINOv2 vit feature backbone weights added thanks to Leng Yue
- FB MAE vit feature backbone weights added
- OpenCLIP DataComp-XL L/14 feat backbone weights added
- MetaFormer (poolformer-v2, caformer, convformer, updated poolformer (v1)) w/ weights added by Fredo Guan
- Experimental
get_intermediate_layers
function on vit/deit models for grabbing hidden states (inspired by DINO impl). This is WIP and may change significantly... feedback welcome. - Model creation throws error if
pretrained=True
and no weights exist (instead of continuing with random initialization) - Fix regression with inception / nasnet TF sourced weights with 1001 classes in original classifiers
- bitsandbytes (https://github.com/TimDettmers/bitsandbytes) optimizers added to factory, use
bnb
prefix, iebnbadam8bit
- Misc cleanup and fixes
- Final testing before switching to a 0.9 and bringing
timm
out of pre-release state
April 27, 2023
- 97% of
timm
models uploaded to HF Hub and almost all updated to support multi-weight pretrained configs - Minor cleanup and refactoring of another batch of models as multi-weight added. More fused_attn (F.sdpa) and features_only support, and torchscript fixes.
April 21, 2023
- Gradient accumulation support added to train script and tested (
--grad-accum-steps
), thanks Taeksang Kim - More weights on HF Hub (cspnet, cait, volo, xcit, tresnet, hardcorenas, densenet, dpn, vovnet, xception_aligned)
- Added
--head-init-scale
and--head-init-bias
to train.py to scale classiifer head and set fixed bias for fine-tune - Remove all InplaceABN (
inplace_abn
) use, replaced use in tresnet with standard BatchNorm (modified weights accordingly).
April 12, 2023
- Add ONNX export script, validate script, helpers that I've had kicking around for along time. Tweak 'same' padding for better export w/ recent ONNX + pytorch.
- Refactor dropout args for vit and vit-like models, separate drop_rate into
drop_rate
(classifier dropout),proj_drop_rate
(block mlp / out projections),pos_drop_rate
(position embedding drop),attn_drop_rate
(attention dropout). Also add patch dropout (FLIP) to vit and eva models. - fused F.scaled_dot_product_attention support to more vit models, add env var (TIMM_FUSED_ATTN) to control, and config interface to enable/disable
- Add EVA-CLIP backbones w/ image tower weights, all the way up to 4B param 'enormous' model, and 336x336 OpenAI ViT mode that was missed.
April 5, 2023
- ALL ResNet models pushed to Hugging Face Hub with multi-weight support
- All past
timm
trained weights added with recipe based tags to differentiate - All ResNet strikes back A1/A2/A3 (seed 0) and R50 example B/C1/C2/D weights available
- Add torchvision v2 recipe weights to existing torchvision originals
- See comparison table in https://huggingface.co/timm/seresnextaa101d_32x8d.sw_in12k_ft_in1k_288#model-comparison
- All past
- New ImageNet-12k + ImageNet-1k fine-tunes available for a few anti-aliased ResNet models
resnetaa50d.sw_in12k_ft_in1k
- 81.7 @ 224, 82.6 @ 288resnetaa101d.sw_in12k_ft_in1k
- 83.5 @ 224, 84.1 @ 288seresnextaa101d_32x8d.sw_in12k_ft_in1k
- 86.0 @ 224, 86.5 @ 288seresnextaa101d_32x8d.sw_in12k_ft_in1k_288
- 86.5 @ 288, 86.7 @ 320
March 31, 2023
- Add first ConvNext-XXLarge CLIP -> IN-1k fine-tune and IN-12k intermediate fine-tunes for convnext-base/large CLIP models.
model | top1 | top5 | img_size | param_count | gmacs | macts |
---|---|---|---|---|---|---|
convnext_xxlarge.clip_laion2b_soup_ft_in1k | 88.612 | 98.704 | 256 | 846.47 | 198.09 | 124.45 |
convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_384 | 88.312 | 98.578 | 384 | 200.13 | 101.11 | 126.74 |
convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_320 | 87.968 | 98.47 | 320 | 200.13 | 70.21 | 88.02 |
convnext_base.clip_laion2b_augreg_ft_in12k_in1k_384 | 87.138 | 98.212 | 384 | 88.59 | 45.21 | 84.49 |
convnext_base.clip_laion2b_augreg_ft_in12k_in1k | 86.344 | 97.97 | 256 | 88.59 | 20.09 | 37.55 |
- Add EVA-02 MIM pretrained and fine-tuned weights, push to HF hub and update model cards for all EVA models. First model over 90% top-1 (99% top-5)! Check out the original code & weights at https://github.com/baaivision/EVA for more details on their work blending MIM, CLIP w/ many model, dataset, and train recipe tweaks.
model | top1 | top5 | param_count | img_size |
---|---|---|---|---|
eva02_large_patch14_448.mim_m38m_ft_in22k_in1k | 90.054 | 99.042 | 305.08 | 448 |
eva02_large_patch14_448.mim_in22k_ft_in22k_in1k | 89.946 | 99.01 | 305.08 | 448 |
eva_giant_patch14_560.m30m_ft_in22k_in1k | 89.792 | 98.992 | 1014.45 | 560 |
eva02_large_patch14_448.mim_in22k_ft_in1k | 89.626 | 98.954 | 305.08 | 448 |
eva02_large_patch14_448.mim_m38m_ft_in1k | 89.57 | 98.918 | 305.08 | 448 |
eva_giant_patch14_336.m30m_ft_in22k_in1k | 89.56 | 98.956 | 1013.01 | 336 |
eva_giant_patch14_336.clip_ft_in1k | 89.466 | 98.82 | 1013.01 | 336 |
eva_large_patch14_336.in22k_ft_in22k_in1k | 89.214 | 98.854 | 304.53 | 336 |
eva_giant_patch14_224.clip_ft_in1k | 88.882 | 98.678 | 1012.56 | 224 |
eva02_base_patch14_448.mim_in22k_ft_in22k_in1k | 88.692 | 98.722 | 87.12 | 448 |
eva_large_patch14_336.in22k_ft_in1k | 88.652 | 98.722 | 304.53 | 336 |
eva_large_patch14_196.in22k_ft_in22k_in1k | 88.592 | 98.656 | 304.14 | 196 |
eva02_base_patch14_448.mim_in22k_ft_in1k | 88.23 | 98.564 | 87.12 | 448 |
eva_large_patch14_196.in22k_ft_in1k | 87.934 | 98.504 | 304.14 | 196 |
eva02_small_patch14_336.mim_in22k_ft_in1k | 85.74 | 97.614 | 22.13 | 336 |
eva02_tiny_patch14_336.mim_in22k_ft_in1k | 80.658 | 95.524 | 5.76 | 336 |
- Multi-weight and HF hub for DeiT and MLP-Mixer based models
March 22, 2023
- More weights pushed to HF hub along with multi-weight support, including:
regnet.py
,rexnet.py
,byobnet.py
,resnetv2.py
,swin_transformer.py
,swin_transformer_v2.py
,swin_transformer_v2_cr.py
- Swin Transformer models support feature extraction (NCHW feat maps for
swinv2_cr_*
, and NHWC for all others) and spatial embedding outputs. - FocalNet (from https://github.com/microsoft/FocalNet) models and weights added with significant refactoring, feature extraction, no fixed resolution / sizing constraint
- RegNet weights increased with HF hub push, SWAG, SEER, and torchvision v2 weights. SEER is pretty poor wrt to performance for model size, but possibly useful.
- More ImageNet-12k pretrained and 1k fine-tuned
timm
weights:rexnetr_200.sw_in12k_ft_in1k
- 82.6 @ 224, 83.2 @ 288rexnetr_300.sw_in12k_ft_in1k
- 84.0 @ 224, 84.5 @ 288regnety_120.sw_in12k_ft_in1k
- 85.0 @ 224, 85.4 @ 288regnety_160.lion_in12k_ft_in1k
- 85.6 @ 224, 86.0 @ 288regnety_160.sw_in12k_ft_in1k
- 85.6 @ 224, 86.0 @ 288 (compare to SWAG PT + 1k FT this is same BUT much lower res, blows SEER FT away)
- Model name deprecation + remapping functionality added (a milestone for bringing 0.8.x out of pre-release). Mappings being added...
- Minor bug fixes and improvements.
Feb 26, 2023
- Add ConvNeXt-XXLarge CLIP pretrained image tower weights for fine-tune & features (fine-tuning TBD) -- see model card
- Update
convnext_xxlarge
default LayerNorm eps to 1e-5 (for CLIP weights, improved stability) - 0.8.15dev0
Feb 20, 2023
- Add 320x320
convnext_large_mlp.clip_laion2b_ft_320
andconvnext_lage_mlp.clip_laion2b_ft_soup_320
CLIP image tower weights for features & fine-tune - 0.8.13dev0 pypi release for latest changes w/ move to huggingface org
Feb 16, 2023
safetensor
checkpoint support added- Add ideas from 'Scaling Vision Transformers to 22 B. Params' (https://arxiv.org/abs/2302.05442) -- qk norm, RmsNorm, parallel block
- Add F.scaled_dot_product_attention support (PyTorch 2.0 only) to
vit_*
,vit_relpos*
,coatnet
/maxxvit
(to start) - Lion optimizer (w/ multi-tensor option) added (https://arxiv.org/abs/2302.06675)
- gradient checkpointing works with
features_only=True
Feb 7, 2023
- New inference benchmark numbers added in results folder.
- Add convnext LAION CLIP trained weights and initial set of in1k fine-tunes
convnext_base.clip_laion2b_augreg_ft_in1k
- 86.2% @ 256x256convnext_base.clip_laiona_augreg_ft_in1k_384
- 86.5% @ 384x384convnext_large_mlp.clip_laion2b_augreg_ft_in1k
- 87.3% @ 256x256convnext_large_mlp.clip_laion2b_augreg_ft_in1k_384
- 87.9% @ 384x384
- Add DaViT models. Supports ...
Release v0.6.13
Release from 0.6.x stable branch with fix for Python 3.11. NOTE original 0.6.13 release tag was against wrong branch.
Release v0.8.17dev0
March 22, 2023
- More weights pushed to HF hub along with multi-weight support, including:
regnet.py
,rexnet.py
,byobnet.py
,resnetv2.py
,swin_transformer.py
,swin_transformer_v2.py
,swin_transformer_v2_cr.py
- Swin Transformer models support feature extraction (NCHW feat maps for
swinv2_cr_*
, and NHWC for all others) and spatial embedding outputs. - FocalNet (from https://github.com/microsoft/FocalNet) models and weights added with significant refactoring, feature extraction, no fixed resolution / sizing constraint
- RegNet weights increased with HF hub push, SWAG, SEER, and torchvision v2 weights. SEER is pretty poor wrt to performance for model size, but possibly useful.
- More ImageNet-12k pretrained and 1k fine-tuned
timm
weights:rexnetr_200.sw_in12k_ft_in1k
- 82.6 @ 224, 83.2 @ 288rexnetr_300.sw_in12k_ft_in1k
- 84.0 @ 224, 84.5 @ 288regnety_120.sw_in12k_ft_in1k
- 85.0 @ 224, 85.4 @ 288regnety_160.lion_in12k_ft_in1k
- 85.6 @ 224, 86.0 @ 288regnety_160.sw_in12k_ft_in1k
- 85.6 @ 224, 86.0 @ 288 (compare to SWAG PT + 1k FT this is same BUT much lower res, blows SEER FT away)
- Model name deprecation + remapping functionality added (a milestone for bringing 0.8.x out of pre-release). Mappings being added...
- Minor bug fixes and improvements.
Feb 26, 2023
- Add ConvNeXt-XXLarge CLIP pretrained image tower weights for fine-tune & features (fine-tuning TBD) -- see model card
- Update
convnext_xxlarge
default LayerNorm eps to 1e-5 (for CLIP weights, improved stability) - 0.8.15dev0
v0.8.13dev0 Release
Feb 20, 2023
- Add 320x320
convnext_large_mlp.clip_laion2b_ft_320
andconvnext_lage_mlp.clip_laion2b_ft_soup_320
CLIP image tower weights for features & fine-tune - 0.8.13dev0 pypi release for latest changes w/ move to huggingface org
Feb 16, 2023
safetensor
checkpoint support added- Add ideas from 'Scaling Vision Transformers to 22 B. Params' (https://arxiv.org/abs/2302.05442) -- qk norm, RmsNorm, parallel block
- Add F.scaled_dot_product_attention support (PyTorch 2.0 only) to
vit_*
,vit_relpos*
,coatnet
/maxxvit
(to start) - Lion optimizer (w/ multi-tensor option) added (https://arxiv.org/abs/2302.06675)
v0.8.10dev0 Release
Feb 7, 2023
- New inference benchmark numbers added in results folder.
- Add convnext LAION CLIP trained weights and initial set of in1k fine-tunes
convnext_base.clip_laion2b_augreg_ft_in1k
- 86.2% @ 256x256convnext_base.clip_laiona_augreg_ft_in1k_384
- 86.5% @ 384x384convnext_large_mlp.clip_laion2b_augreg_ft_in1k
- 87.3% @ 256x256convnext_large_mlp.clip_laion2b_augreg_ft_in1k_384
- 87.9% @ 384x384
- Add DaViT models. Supports
features_only=True
. Adapted from https://github.com/dingmyu/davit by Fredo. - Use a common NormMlpClassifierHead across MaxViT, ConvNeXt, DaViT
- Add EfficientFormer-V2 model, update EfficientFormer, and refactor LeViT (closely related architectures). Weights on HF hub.
- New EfficientFormer-V2 arch, significant refactor from original at (https://github.com/snap-research/EfficientFormer). Supports
features_only=True
. - Minor updates to EfficientFormer.
- Refactor LeViT models to stages, add
features_only=True
support to newconv
variants, weight remap required.
- New EfficientFormer-V2 arch, significant refactor from original at (https://github.com/snap-research/EfficientFormer). Supports
- Move ImageNet meta-data (synsets, indices) from
/results
totimm/data/_info
. - Add ImageNetInfo / DatasetInfo classes to provide labelling for various ImageNet classifier layouts in
timm
- Update
inference.py
to use, try:python inference.py /folder/to/images --model convnext_small.in12k --label-type detail --topk 5
- Update
- Ready for 0.8.10 pypi pre-release (final testing).
Jan 20, 2023
-
Add two convnext 12k -> 1k fine-tunes at 384x384
convnext_tiny.in12k_ft_in1k_384
- 85.1 @ 384convnext_small.in12k_ft_in1k_384
- 86.2 @ 384
-
Push all MaxxViT weights to HF hub, and add new ImageNet-12k -> 1k fine-tunes for
rw
base MaxViT and CoAtNet 1/2 models
model | top1 | top5 | samples / sec | Params (M) | GMAC | Act (M) |
---|---|---|---|---|---|---|
maxvit_xlarge_tf_512.in21k_ft_in1k | 88.53 | 98.64 | 21.76 | 475.77 | 534.14 | 1413.22 |
maxvit_xlarge_tf_384.in21k_ft_in1k | 88.32 | 98.54 | 42.53 | 475.32 | 292.78 | 668.76 |
maxvit_base_tf_512.in21k_ft_in1k | 88.20 | 98.53 | 50.87 | 119.88 | 138.02 | 703.99 |
maxvit_large_tf_512.in21k_ft_in1k | 88.04 | 98.40 | 36.42 | 212.33 | 244.75 | 942.15 |
maxvit_large_tf_384.in21k_ft_in1k | 87.98 | 98.56 | 71.75 | 212.03 | 132.55 | 445.84 |
maxvit_base_tf_384.in21k_ft_in1k | 87.92 | 98.54 | 104.71 | 119.65 | 73.80 | 332.90 |
maxvit_rmlp_base_rw_384.sw_in12k_ft_in1k | 87.81 | 98.37 | 106.55 | 116.14 | 70.97 | 318.95 |
maxxvitv2_rmlp_base_rw_384.sw_in12k_ft_in1k | 87.47 | 98.37 | 149.49 | 116.09 | 72.98 | 213.74 |
coatnet_rmlp_2_rw_384.sw_in12k_ft_in1k | 87.39 | 98.31 | 160.80 | 73.88 | 47.69 | 209.43 |
maxvit_rmlp_base_rw_224.sw_in12k_ft_in1k | 86.89 | 98.02 | 375.86 | 116.14 | 23.15 | 92.64 |
maxxvitv2_rmlp_base_rw_224.sw_in12k_ft_in1k | 86.64 | 98.02 | 501.03 | 116.09 | 24.20 | 62.77 |
maxvit_base_tf_512.in1k | 86.60 | 97.92 | 50.75 | 119.88 | 138.02 | 703.99 |
coatnet_2_rw_224.sw_in12k_ft_in1k | 86.57 | 97.89 | 631.88 | 73.87 | 15.09 | 49.22 |
maxvit_large_tf_512.in1k | 86.52 | 97.88 | 36.04 | 212.33 | 244.75 | 942.15 |
coatnet_rmlp_2_rw_224.sw_in12k_ft_in1k | 86.49 | 97.90 | 620.58 | 73.88 | 15.18 | 54.78 |
maxvit_base_tf_384.in1k | 86.29 | 97.80 | 101.09 | 119.65 | 73.80 | 332.90 |
maxvit_large_tf_384.in1k | 86.23 | 97.69 | 70.56 | 212.03 | 132.55 | 445.84 |
maxvit_small_tf_512.in1k | 86.10 | 97.76 | 88.63 | 69.13 | 67.26 | 383.77 |
maxvit_tiny_tf_512.in1k | 85.67 | 97.58 | 144.25 | 31.05 | 33.49 | 257.59 |
maxvit_small_tf_384.in1k | 85.54 | 97.46 | 188.35 | 69.02 | 35.87 | 183.65 |
maxvit_tiny_tf_384.in1k | 85.11 | 97.38 | 293.46 | 30.98 | 17.53 | 123.42 |
maxvit_large_tf_224.in1k | 84.93 | 96.97 | 247.71 | 211.79 | 43.68 | 127.35 |
coatnet_rmlp_1_rw2_224.sw_in12k_ft_in1k | 84.90 | 96.96 | 1025.45 | 41.72 | 8.11 | 40.13 |
maxvit_base_tf_224.in1k | 84.85 | 96.99 | 358.25 | 119.47 | 24.04 | 95.01 |
maxxvit_rmlp_small_rw_256.sw_in1k | 84.63 | 97.06 | 575.53 | 66.01 | 14.67 | 58.38 |
coatnet_rmlp_2_rw_224.sw_in1k | 84.61 | 96.74 | 625.81 | 73.88 | 15.18 | 54.78 |
maxvit_rmlp_small_rw_224.sw_in1k | 84.49 | 96.76 | 693.82 | 64.90 | 10.75 | 49.30 |
maxvit_small_tf_224.in1k | 84.43 | 96.83 | 647.96 | 68.93 | 11.66 | 53.17 |
maxvit_rmlp_tiny_rw_256.sw_in1k | 84.23 | 96.78 | 807.21 | 29.15 | 6.77 | 46.92 |
coatnet_1_rw_224.sw_in1k | 83.62 | 96.38 | 989.59 | 41.72 | 8.04 | 34.60 |
maxvit_tiny_rw_224.sw_in1k | 83.50 | 96.50 | 1100.53 | 29.06 | 5.11 | 33.11 |
maxvit_tiny_tf_224.in1k | 83.41 | 96.59 | 1004.94 | 30.92 | 5.60 | 35.78 |
coatnet_rmlp_1_rw_224.sw_in1k | 83.36 | 96.45 | 1093.03 | 41.69 | 7.85 | 35.47 |
maxxvitv2_nano_rw_256.sw_in1k | 83.11 | 96.33 | 1276.88 | 23.70 | 6.26 | 23.05 |
maxxvit_rmlp_nano_rw_256.sw_in1k | 83.03 | 96.34 | 1341.24 | 16.78 | 4.37 | 26.05 |
maxvit_rmlp_nano_rw_256.sw_in1k | 82.96 | 96.26 | 1283.24 | 15.50 | 4.47 | 31.92 |
maxvit_nano_rw_256.sw_in1k | 82.93 | 96.23 | 1218.17 | 15.45 | 4.46 | 30.28 |
coatnet_bn_0_rw_224.sw_in1k | 82.39 | 96.19 | 1600.14 | 27.44 | 4.67 | 22.04 |
coatnet_0_rw_224.sw_in1k | 82.39 | 95.84 | 1831.21 | 27.44 | 4.43 | 18.73 |
coatnet_rmlp_nano_rw_224.sw_in1k | 82.05 | 95.87 | 2109.09 | 15.15 | 2.62 | 20.34 |
coatnext_nano_rw_224.sw_in1k | 81.95 | 95.92 | 2525.52 | 14.70 | 2.47 | 12.80 |
coatnet_nano_rw_224.sw_in1k | 81.70 | 95.64 | 2344.52 | 15.14 | 2.41 | 15.41 |
maxvit_rmlp_pico_rw_256.sw_in1k | 80.... |
v0.8.6dev0 Release
Jan 11, 2023
- Update ConvNeXt ImageNet-12k pretrain series w/ two new fine-tuned weights (and pre FT
.in12k
tags)convnext_nano.in12k_ft_in1k
- 82.3 @ 224, 82.9 @ 288 (previously released)convnext_tiny.in12k_ft_in1k
- 84.2 @ 224, 84.5 @ 288convnext_small.in12k_ft_in1k
- 85.2 @ 224, 85.3 @ 288
Jan 6, 2023
- Finally got around to adding
--model-kwargs
and--opt-kwargs
to scripts to pass through rare args directly to model classes from cmd linetrain.py /imagenet --model resnet50 --amp --model-kwargs output_stride=16 act_layer=silu
train.py /imagenet --model vit_base_patch16_clip_224 --img-size 240 --amp --model-kwargs img_size=240 patch_size=12
- Cleanup some popular models to better support arg passthrough / merge with model configs, more to go.
Jan 5, 2023
- ConvNeXt-V2 models and weights added to existing
convnext.py
- Paper: ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders
- Reference impl: https://github.com/facebookresearch/ConvNeXt-V2 (NOTE: weights currently CC-BY-NC)