Releases: huggingface/pytorch-image-models
v0.5.4 - More weights, models. ResNet strikes back, self-attn - convnet hybrids, optimizers and more
Default conv_mlp to False across the board for ConvNeXt, causing issu… …es on more setups than it's improving right now...
v0.1-rsb-weights
Weights for ResNet Strikes Back
Paper: https://arxiv.org/abs/2110.00476
More details on weights and hparams to come...
v0.1-attn-weights
A collection of weights I've trained comparing various types of SE-like (SE, ECA, GC, etc), self-attention (bottleneck, halo, lambda) blocks, and related non-attn baselines.
ResNet-26-T series
- [2, 2, 2, 2] repeat Bottlneck block ResNet architecture
- ReLU activations
- 3 layer stem with 24, 32, 64 chs, max-pool
- avg pool in shortcut downsample
- self-attn blocks replace 3x3 in both blocks for last stage, and second block of penultimate stage
model | top1 | top1_err | top5 | top5_err | param_count | img_size | cropt_pct | interpolation |
---|---|---|---|---|---|---|---|---|
botnet26t_256 | 79.246 | 20.754 | 94.53 | 5.47 | 12.49 | 256 | 0.95 | bicubic |
halonet26t | 79.13 | 20.87 | 94.314 | 5.686 | 12.48 | 256 | 0.95 | bicubic |
lambda_resnet26t | 79.112 | 20.888 | 94.59 | 5.41 | 10.96 | 256 | 0.94 | bicubic |
lambda_resnet26rpt_256 | 78.964 | 21.036 | 94.428 | 5.572 | 10.99 | 256 | 0.94 | bicubic |
resnet26t | 77.872 | 22.128 | 93.834 | 6.166 | 16.01 | 256 | 0.94 | bicubic |
Details:
- HaloNet - 8 pixel block size, 2 pixel halo (overlap), relative position embedding
- BotNet - relative position embedding
- Lambda-ResNet-26-T - 3d lambda conv, kernel = 9
- Lambda-ResNet-26-RPT - relative position embedding
Benchmark - RTX 3090 - AMP - NCHW - NGC 21.09
model | infer_samples_per_sec | infer_step_time | infer_batch_size | infer_img_size | train_samples_per_sec | train_step_time | train_batch_size | train_img_size | param_count |
---|---|---|---|---|---|---|---|---|---|
resnet26t | 2967.55 | 86.252 | 256 | 256 | 857.62 | 297.984 | 256 | 256 | 16.01 |
botnet26t_256 | 2642.08 | 96.879 | 256 | 256 | 809.41 | 315.706 | 256 | 256 | 12.49 |
halonet26t | 2601.91 | 98.375 | 256 | 256 | 783.92 | 325.976 | 256 | 256 | 12.48 |
lambda_resnet26t | 2354.1 | 108.732 | 256 | 256 | 697.28 | 366.521 | 256 | 256 | 10.96 |
lambda_resnet26rpt_256 | 1847.34 | 138.563 | 256 | 256 | 644.84 | 197.892 | 128 | 256 | 10.99 |
Benchmark - RTX 3090 - AMP - NHWC - NGC 21.09
model | infer_samples_per_sec | infer_step_time | infer_batch_size | infer_img_size | train_samples_per_sec | train_step_time | train_batch_size | train_img_size | param_count |
---|---|---|---|---|---|---|---|---|---|
resnet26t | 3691.94 | 69.327 | 256 | 256 | 1188.17 | 214.96 | 256 | 256 | 16.01 |
botnet26t_256 | 3291.63 | 77.76 | 256 | 256 | 1126.68 | 226.653 | 256 | 256 | 12.49 |
halonet26t | 3230.5 | 79.232 | 256 | 256 | 1077.82 | 236.934 | 256 | 256 | 12.48 |
lambda_resnet26rpt_256 | 2324.15 | 110.133 | 256 | 256 | 864.42 | 147.485 | 128 | 256 | 10.99 |
lambda_resnet26t | Not Supported |
ResNeXT-26-T series
- [2, 2, 2, 2] repeat Bottlneck block ResNeXt architectures
- SiLU activations
- grouped 3x3 convolutions in bottleneck, 32 channels per group
- 3 layer stem with 24, 32, 64 chs, max-pool
- avg pool in shortcut downsample
- channel attn (active in non self-attn blocks) between 3x3 and last 1x1 conv
- when active, self-attn blocks replace 3x3 conv in both blocks for last stage, and second block of penultimate stage
model | top1 | top1_err | top5 | top5_err | param_count | img_size | cropt_pct | interpolation |
---|---|---|---|---|---|---|---|---|
eca_halonext26ts | 79.484 | 20.516 | 94.600 | 5.400 | 10.76 | 256 | 0.94 | bicubic |
eca_botnext26ts_256 | 79.270 | 20.730 | 94.594 | 5.406 | 10.59 | 256 | 0.95 | bicubic |
bat_resnext26ts | 78.268 | 21.732 | 94.1 | 5.9 | 10.73 | 256 | 0.9 | bicubic |
seresnext26ts | 77.852 | 22.148 | 93.784 | 6.216 | 10.39 | 256 | 0.9 | bicubic |
gcresnext26ts | 77.804 | 22.196 | 93.824 | 6.176 | 10.48 | 256 | 0.9 | bicubic |
eca_resnext26ts | 77.446 | 22.554 | 93.57 | 6.43 | 10.3 | 256 | 0.9 | bicubic |
resnext26ts | 76.764 | 23.236 | 93.136 | 6.864 | 10.3 | 256 | 0.9 | bicubic |
Benchmark - RTX 3090 - AMP - NCHW - NGC 21.09
model | infer_samples_per_sec | infer_step_time | infer_batch_size | infer_img_size | train_samples_per_sec | train_step_time | train_batch_size | train_img_size | param_count |
---|---|---|---|---|---|---|---|---|---|
resnext26ts | 3006.57 | 85.134 | 256 | 256 | 864.4 | 295.646 | 256 | 256 | 10.3 |
seresnext26ts | 2931.27 | 87.321 | 256 | 256 | 836.92 | 305.193 | 256 | 256 | 10.39 |
eca_resnext26ts | 2925.47 | 87.495 | 256 | 256 | 837.78 | 305.003 | 256 | 256 | 10.3 |
gcresnext26ts | 2870.01 | 89.186 | 256 | 256 | 818.35 | 311.97 | 256 | 256 | 10.48 |
eca_botnext26ts_256 | 2652.03 | 96.513 | 256 | 256 | 790.43 | 323.257 | 256 | 256 | 10.59 |
eca_halonext26ts | 2593.03 | 98.705 | 256 | 256 | 766.07 | 333.541 | 256 | 256 | 10.76 |
bat_resnext26ts | 2469.78 | 103.64 | 256 | 256 | 697.21 | 365.964 | 256 | 256 | 10.73 |
Benchmark - RTX 3090 - AMP - NHWC - NGC 21.09
NOTE: there are performance issues with certain grouped conv configs with channels last layout, backwards pass in particular is really slow. Also causing issues for RegNet and NFNet networks.
model | infer_samples_per_sec | infer_step_time | infer_batch_size | infer_img_size | train_samples_per_sec | train_step_time | train_batch_size | train_img_size | param_count |
---|---|---|---|---|---|---|---|---|---|
resnext26ts | 3952.37 | 64.755 | 256 | 256 | 608.67 | 420.049 | 256 | 256 | 10.3 |
eca_resnext26ts | 3815.77 | 67.074 | 256 | 256 | 594.35 | 430.146 | 256 | 256 | 10.3 |
seresnext26ts | 3802.75 | 67.304 | 256 | 256 | 592.82 | 431.14 | 256 | 256 | 10.39 |
gcresnext26ts | 3626.97 | 70.57 | 256 | 256 | 581.83 | 439.119 | 256 | 256 | 10.48 |
eca_botnext26ts_256 | 3515.84 | 72.8 | 256 | 256 | 611.71 | 417.862 | 256 | 256 | 10.59 |
eca_halonext26ts | 3410.12 | 75.057 | 256 | 256 | 597.52 | 427.789 | 256 | 256 | 10.76 |
bat_resnext26ts | 3053.83 | 83.811 | 256 | 256 | 533.23 | 478.839 | 256 | 256 | 10.73 |
ResNet-33-T series.
- [2, 3, 3, 2] repeat Bottlneck block ResNet architecture
- SiLU activations
- 3 layer stem with 24, 32, 64 chs, no max-pool, 1st and 3rd conv stride 2
- avg pool in shortcut downsample
- channel attn (active in non self-attn blocks) between 3x3 and last 1x1 conv
- when active, self-attn blocks replace 3x3 conv last block of stage 2 and 3, and both blocks of final stage
- FC 1x1 conv between last block and classifier
The 33-layer models have an extra 1x1 FC layer between last conv block and classifier. There is both a non-attenion 33 layer baseline and a 32 layer without the extra FC.
model | top1 | top1_err | top5 | top5_err | param_count | img_size | cropt_pct | interpolation |
---|---|---|---|---|---|---|---|---|
sehalonet33ts | 80.986 | 19.014 | 95.272 | 4.728 | 13.69 | 256 | 0.94 | bicubic |
seresnet33ts | 80.388 | 19.612 | 95.108 | 4.892 | 19.78 | 256 | 0.94 | bicubic |
eca_resnet33ts | 80.132 | 19.868 | 95.054 | 4.946 | 19.68 | 256 | 0.94 | bicubic |
gcresnet33ts | 79.99 | 20.01 | 94.988 | 5.012 | 19.88 | 256 | 0.94 | bicubic |
resnet33ts | 79.352 | 20.648 | 94.596 | 5.404 | 19.68 | 256 | 0.94 | bicubic |
resnet32ts | 79.028... |
v0.4.12. Vision Transformer AugReg support and more
- Vision Transformer AugReg weights and model defs (https://arxiv.org/abs/2106.10270)
- ResMLP official weights
- ECA-NFNet-L2 weights
- gMLP-S weights
- ResNet51-Q
- Visformer, LeViT, ConViT, Twins
- Many fixes, improvements, better test coverage
3rd Party Vision Transformer Weights
A catch-all (ish) release for storing vision transformer weights adapted/rehosted from 3rd parties. Too many incoming models for one release per source...
Containing weights from:
- Twins - https://github.com/Meituan-AutoML/Twins
- Visformer - danczs/Visformer#2
- NesT (Aggregated Nested Transformer) - weights converted from https://github.com/google-research/nested-transformer by @alexander-soare ' script
v0.4.9. EfficientNetV2. MLP-Mixer. ResNet-RS. More vision transformers.
Fix drop/drop_path arg on MLP-Mixer model. Fix #641
EfficientNet-V2 weights ported from Tensorflow impl
Weights from https://github.com/google/automl/tree/master/efficientnetv2
Paper: EfficientNetV2: Smaller Models and Faster Training
- https://arxiv.org/abs/2104.00298
ResNet-RS weights
Weights for ResNet-RS models as per #554 . Ported from Tensorflow impl (https://github.com/tensorflow/tpu/tree/master/models/official/resnet/resnet_rs) by @amaarora
Weights for CoaT (vision transformer) models
Weights for CoaT: Co-Scale Conv-Attentional Image Transformers (from https://github.com/mlpc-ucsd/CoaT)
Weights for PiT (Pooling-based Vision Transformer) models
Weights from https://github.com/naver-ai/pit
Copyright 2021-present NAVER Corp.
Rehosted here for easy pytorch hub downloads.