Skip to content

Releases: huggingface/pytorch-image-models

v0.5.4 - More weights, models. ResNet strikes back, self-attn - convnet hybrids, optimizers and more

17 Jan 05:03
Compare
Choose a tag to compare
Default conv_mlp to False across the board for ConvNeXt, causing issu…

…es on more setups than it's improving right now...

v0.1-rsb-weights

04 Oct 00:02
b5bf4dc
Compare
Choose a tag to compare

Weights for ResNet Strikes Back

Paper: https://arxiv.org/abs/2110.00476

More details on weights and hparams to come...

v0.1-attn-weights

04 Sep 00:25
Compare
Choose a tag to compare

A collection of weights I've trained comparing various types of SE-like (SE, ECA, GC, etc), self-attention (bottleneck, halo, lambda) blocks, and related non-attn baselines.

ResNet-26-T series

  • [2, 2, 2, 2] repeat Bottlneck block ResNet architecture
  • ReLU activations
  • 3 layer stem with 24, 32, 64 chs, max-pool
  • avg pool in shortcut downsample
  • self-attn blocks replace 3x3 in both blocks for last stage, and second block of penultimate stage
model top1 top1_err top5 top5_err param_count img_size cropt_pct interpolation
botnet26t_256 79.246 20.754 94.53 5.47 12.49 256 0.95 bicubic
halonet26t 79.13 20.87 94.314 5.686 12.48 256 0.95 bicubic
lambda_resnet26t 79.112 20.888 94.59 5.41 10.96 256 0.94 bicubic
lambda_resnet26rpt_256 78.964 21.036 94.428 5.572 10.99 256 0.94 bicubic
resnet26t 77.872 22.128 93.834 6.166 16.01 256 0.94 bicubic

Details:

  • HaloNet - 8 pixel block size, 2 pixel halo (overlap), relative position embedding
  • BotNet - relative position embedding
  • Lambda-ResNet-26-T - 3d lambda conv, kernel = 9
  • Lambda-ResNet-26-RPT - relative position embedding

Benchmark - RTX 3090 - AMP - NCHW - NGC 21.09

model infer_samples_per_sec infer_step_time infer_batch_size infer_img_size train_samples_per_sec train_step_time train_batch_size train_img_size param_count
resnet26t 2967.55 86.252 256 256 857.62 297.984 256 256 16.01
botnet26t_256 2642.08 96.879 256 256 809.41 315.706 256 256 12.49
halonet26t 2601.91 98.375 256 256 783.92 325.976 256 256 12.48
lambda_resnet26t 2354.1 108.732 256 256 697.28 366.521 256 256 10.96
lambda_resnet26rpt_256 1847.34 138.563 256 256 644.84 197.892 128 256 10.99

Benchmark - RTX 3090 - AMP - NHWC - NGC 21.09

model infer_samples_per_sec infer_step_time infer_batch_size infer_img_size train_samples_per_sec train_step_time train_batch_size train_img_size param_count
resnet26t 3691.94 69.327 256 256 1188.17 214.96 256 256 16.01
botnet26t_256 3291.63 77.76 256 256 1126.68 226.653 256 256 12.49
halonet26t 3230.5 79.232 256 256 1077.82 236.934 256 256 12.48
lambda_resnet26rpt_256 2324.15 110.133 256 256 864.42 147.485 128 256 10.99
lambda_resnet26t Not Supported

ResNeXT-26-T series

  • [2, 2, 2, 2] repeat Bottlneck block ResNeXt architectures
  • SiLU activations
  • grouped 3x3 convolutions in bottleneck, 32 channels per group
  • 3 layer stem with 24, 32, 64 chs, max-pool
  • avg pool in shortcut downsample
  • channel attn (active in non self-attn blocks) between 3x3 and last 1x1 conv
  • when active, self-attn blocks replace 3x3 conv in both blocks for last stage, and second block of penultimate stage
model top1 top1_err top5 top5_err param_count img_size cropt_pct interpolation
eca_halonext26ts 79.484 20.516 94.600 5.400 10.76 256 0.94 bicubic
eca_botnext26ts_256 79.270 20.730 94.594 5.406 10.59 256 0.95 bicubic
bat_resnext26ts 78.268 21.732 94.1 5.9 10.73 256 0.9 bicubic
seresnext26ts 77.852 22.148 93.784 6.216 10.39 256 0.9 bicubic
gcresnext26ts 77.804 22.196 93.824 6.176 10.48 256 0.9 bicubic
eca_resnext26ts 77.446 22.554 93.57 6.43 10.3 256 0.9 bicubic
resnext26ts 76.764 23.236 93.136 6.864 10.3 256 0.9 bicubic

Benchmark - RTX 3090 - AMP - NCHW - NGC 21.09

model infer_samples_per_sec infer_step_time infer_batch_size infer_img_size train_samples_per_sec train_step_time train_batch_size train_img_size param_count
resnext26ts 3006.57 85.134 256 256 864.4 295.646 256 256 10.3
seresnext26ts 2931.27 87.321 256 256 836.92 305.193 256 256 10.39
eca_resnext26ts 2925.47 87.495 256 256 837.78 305.003 256 256 10.3
gcresnext26ts 2870.01 89.186 256 256 818.35 311.97 256 256 10.48
eca_botnext26ts_256 2652.03 96.513 256 256 790.43 323.257 256 256 10.59
eca_halonext26ts 2593.03 98.705 256 256 766.07 333.541 256 256 10.76
bat_resnext26ts 2469.78 103.64 256 256 697.21 365.964 256 256 10.73

Benchmark - RTX 3090 - AMP - NHWC - NGC 21.09

NOTE: there are performance issues with certain grouped conv configs with channels last layout, backwards pass in particular is really slow. Also causing issues for RegNet and NFNet networks.

model infer_samples_per_sec infer_step_time infer_batch_size infer_img_size train_samples_per_sec train_step_time train_batch_size train_img_size param_count
resnext26ts 3952.37 64.755 256 256 608.67 420.049 256 256 10.3
eca_resnext26ts 3815.77 67.074 256 256 594.35 430.146 256 256 10.3
seresnext26ts 3802.75 67.304 256 256 592.82 431.14 256 256 10.39
gcresnext26ts 3626.97 70.57 256 256 581.83 439.119 256 256 10.48
eca_botnext26ts_256 3515.84 72.8 256 256 611.71 417.862 256 256 10.59
eca_halonext26ts 3410.12 75.057 256 256 597.52 427.789 256 256 10.76
bat_resnext26ts 3053.83 83.811 256 256 533.23 478.839 256 256 10.73

ResNet-33-T series.

  • [2, 3, 3, 2] repeat Bottlneck block ResNet architecture
  • SiLU activations
  • 3 layer stem with 24, 32, 64 chs, no max-pool, 1st and 3rd conv stride 2
  • avg pool in shortcut downsample
  • channel attn (active in non self-attn blocks) between 3x3 and last 1x1 conv
  • when active, self-attn blocks replace 3x3 conv last block of stage 2 and 3, and both blocks of final stage
  • FC 1x1 conv between last block and classifier

The 33-layer models have an extra 1x1 FC layer between last conv block and classifier. There is both a non-attenion 33 layer baseline and a 32 layer without the extra FC.

model top1 top1_err top5 top5_err param_count img_size cropt_pct interpolation
sehalonet33ts 80.986 19.014 95.272 4.728 13.69 256 0.94 bicubic
seresnet33ts 80.388 19.612 95.108 4.892 19.78 256 0.94 bicubic
eca_resnet33ts 80.132 19.868 95.054 4.946 19.68 256 0.94 bicubic
gcresnet33ts 79.99 20.01 94.988 5.012 19.88 256 0.94 bicubic
resnet33ts 79.352 20.648 94.596 5.404 19.68 256 0.94 bicubic
resnet32ts 79.028...
Read more

v0.4.12. Vision Transformer AugReg support and more

30 Jun 16:35
Compare
Choose a tag to compare
  • Vision Transformer AugReg weights and model defs (https://arxiv.org/abs/2106.10270)
  • ResMLP official weights
  • ECA-NFNet-L2 weights
  • gMLP-S weights
  • ResNet51-Q
  • Visformer, LeViT, ConViT, Twins
  • Many fixes, improvements, better test coverage

3rd Party Vision Transformer Weights

21 May 23:02
b4ebf92
Compare
Choose a tag to compare

A catch-all (ish) release for storing vision transformer weights adapted/rehosted from 3rd parties. Too many incoming models for one release per source...

Containing weights from:

v0.4.9. EfficientNetV2. MLP-Mixer. ResNet-RS. More vision transformers.

18 May 23:17
Compare
Choose a tag to compare
Fix drop/drop_path arg on MLP-Mixer model. Fix #641

EfficientNet-V2 weights ported from Tensorflow impl

14 May 18:10
Compare
Choose a tag to compare

ResNet-RS weights

04 May 01:18
Compare
Choose a tag to compare

Weights for ResNet-RS models as per #554 . Ported from Tensorflow impl (https://github.com/tensorflow/tpu/tree/master/models/official/resnet/resnet_rs) by @amaarora

Weights for CoaT (vision transformer) models

28 Apr 23:08
Compare
Choose a tag to compare

Weights for CoaT: Co-Scale Conv-Attentional Image Transformers (from https://github.com/mlpc-ucsd/CoaT)

Weights for PiT (Pooling-based Vision Transformer) models

31 Mar 19:11
Compare
Choose a tag to compare

Weights from https://github.com/naver-ai/pit

Copyright 2021-present NAVER Corp.

Rehosted here for easy pytorch hub downloads.