Training code of three variants of ResNet on ImageNet:
The training follows the exact recipe used by the Training ImageNet in 1 Hour paper and gets the same performance. Models can be downloaded here.
This recipe has better performance than most open source implementations. In fact, many papers that claim to "improve" ResNet only compete with a lower baseline and they actually cannot beat this ResNet recipe.
Model | Top 5 Error | Top 1 Error | Download |
---|---|---|---|
ResNet18 | 10.50% | 29.66% | ⬇️ |
ResNet34 | 8.56% | 26.17% | ⬇️ |
ResNet50 | 6.85% | 23.61% | ⬇️ |
ResNet50-SE | 6.24% | 22.64% | ⬇️ |
ResNet101 | 6.04% | 21.95% | ⬇️ |
ResNet152 | 5.78% | 21.51% | ⬇️ |
To train, first decompress ImageNet data into this structure, then:
./imagenet-resnet.py --data /path/to/original/ILSVRC -d 50 [--mode resnet/preact/se]
# See ./imagenet-resnet.py -h for other options.
You should be able to see good GPU utilization (95%~99%), if your data is fast enough. With batch=64x8, it can finish 100 epochs in 16 hours on AWS p3.16xlarge (8 V100s).
The default data pipeline is probably OK for machines with SSD & 20 CPU cores. See the tutorial on other options to speed up your data.
This script only converts and runs ImageNet-ResNet{50,101,152} Caffe models released by MSRA.
Note that the architecture is different from the imagenet-resnet.py
script and the models are not compatible.
ResNets have evolved, generally you should not cite these numbers as baselines in your paper.
Usage:
# download and convert caffe model to npz format
python -m tensorpack.utils.loadcaffe PATH/TO/{ResNet-101-deploy.prototxt,ResNet-101-model.caffemodel} ResNet101.npz
# run on an image
./load-resnet.py --load ResNet-101.npz --input cat.jpg --depth 101
The converted models are verified on ILSVRC12 validation set. The per-pixel mean used here is slightly different from the original.
Model | Top 5 Error | Top 1 Error |
---|---|---|
ResNet 50 | 7.78% | 24.77% |
ResNet 101 | 7.11% | 23.54% |
ResNet 152 | 6.71% | 23.21% |
Reproduce pre-activation ResNet on CIFAR10.
Also see a DenseNet implementation of the paper Densely Connected Convolutional Networks.
Reproduce the mixup pre-act ResNet-18 CIFAR10 experiment, in the paper:
This implementation follows exact settings from the author's code. Note that the architecture is different from the offcial preact-ResNet18.
Usage:
./cifar10-preact18-mixup.py # train without mixup
./cifar10-preact18-mixup.py --mixup # with mixup
Results of the reference code can be reproduced. In one run it gives me: 5.48% without mixup; 4.17% with mixup (alpha=1).