-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Add DCFF #295
[Feature] Add DCFF #295
Changes from 50 commits
4ef83de
159500f
49c524b
762544d
8ad6ef0
4fc4a0e
ce2578e
f23e185
6368117
cf84c1b
2488025
80da3df
8f2ef5c
2dab5d8
e66e23d
214b007
54bae0c
64cb49f
505700d
6dc3517
d878c6c
ca38384
784c62a
8fc9295
79725cc
b464e93
d137b67
d9d07de
52997e4
94cfdf3
9d98e1d
96292fc
a9f5d99
a8a903c
584e671
0581fd4
1bc4e2e
47e8e37
8a0ec1e
d2f8c86
c24fa31
d870dd9
c464766
4a9cc66
88fee8a
e388cdc
55526b1
6e63d9e
e387082
779cc18
33ade2c
beb86ee
a8a3023
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
# Training Compact CNNs for Image Classification using Dynamic-coded Filter Fusion | ||
|
||
## Abstract | ||
|
||
The mainstream approach for filter pruning is usually either to force a hard-coded importance estimation upon a computation-heavy pretrained model to select “important” filters, or to impose a hyperparameter-sensitive sparse constraint on the loss objective to regularize the network training. In this paper, we present a novel filter pruning method, dubbed dynamic-coded filter fusion (DCFF), to derive compact CNNs in a computationeconomical and regularization-free manner for efficient image classification. Each filter in our DCFF is firstly given an intersimilarity distribution with a temperature parameter as a filter proxy, on top of which, a fresh Kullback-Leibler divergence based dynamic-coded criterion is proposed to evaluate the filter importance. In contrast to simply keeping high-score filters in other methods, we propose the concept of filter fusion, i.e., the weighted averages using the assigned proxies, as our preserved filters. We obtain a one-hot inter-similarity distribution as the temperature parameter approaches infinity. Thus, the relative importance of each filter can vary along with the training of the compact CNN, leading to dynamically changeable fused filters without both the dependency on the pretrained model and the introduction of sparse constraints. Extensive experiments on classification benchmarks demonstrate the superiority of our DCFF over the compared counterparts. For example, our DCFF derives a compact VGGNet-16 with only 72.77M FLOPs and 1.06M parameters while reaching top-1 accuracy of 93.47% on CIFAR-10. A compact ResNet-50 is obtained with 63.8% FLOPs and 58.6% parameter reductions, retaining 75.60% top1 accuracy on ILSVRC-2012. | ||
|
||
![pipeline](https://user-images.githubusercontent.com/31244134/189286581-722853ba-c6d7-4a39-b902-37995b444c71.jpg) | ||
|
||
## Results and models | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The readME should add necessary notes on how to generate There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated with channel_unit_cfg generation using tools/get_channel_unit.py |
||
|
||
### 1. Classification | ||
|
||
| Dataset | Backbone | Params(M) | FLOPs(M) | lr_type | Top-1 (%) | Top-5 (%) | CPrate | Config | Download | | ||
| :------: | :----------: | :-------: | :------: | :-----: | :-------: | :-------: | :---------------------------------------------: | :--------------------------------------------------: | :--------------------------: | | ||
| ImageNet | DCFFResNet50 | 15.16 | 2260 | step | 73.96 | 91.66 | \[0.0\]+\[0.35,0.4,0.1\]\*10+\[0.3,0.3,0.1\]\*6 | [config](../../mmcls/dcff/dcff_resnet_8xb32_in1k.py) | [model](<>) \| \[log\] (\<>) | | ||
|
||
### 2. Detection | ||
|
||
| Dataset | Method | Backbone | Style | Lr schd | Params(M) | FLOPs(M) | bbox AP | CPrate | Config | Download | | ||
| :-----: | :---------: | :----------: | :-----: | :-----: | :-------: | :------: | :-----: | :---------------------------------------------: | :---------------------------------------------------------------: | :--------------------------: | | ||
| COCO | Faster_RCNN | DCFFResNet50 | pytorch | step | 33.31 | 168320 | 35.8 | \[0.0\]+\[0.35,0.4,0.1\]\*10+\[0.3,0.3,0.1\]\*6 | [config](../../mmdet/dcff/dcff_faster_rcnn_resnet50_8xb4_coco.py) | [model](<>) \| \[log\] (\<>) | | ||
|
||
### 3. Segmentation | ||
|
||
| Dataset | Method | Backbone | crop size | Lr schd | Params(M) | FLOPs(M) | mIoU | CPrate | Config | Download | | ||
| :--------: | :-------: | :-------------: | :-------: | :-----: | :-------: | :------: | :---: | :-----------------------------------------------------------------: | :-------------------------------------------------------------------: | :--------------------------: | | ||
| Cityscapes | PointRend | DCFFResNetV1c50 | 512x1024 | 160k | 18.43 | 74410 | 76.75 | \[0.0, 0.0, 0.0\] + \[0.35, 0.4, 0.1\] * 10 + \[0.3, 0.3, 0.1\] * 6 | [config](../../mmseg/dcff/dcff_pointrend_resnet50_8xb2_cityscapes.py) | [model](<>) \| \[log\] (\<>) | | ||
|
||
### 4. Pose | ||
|
||
| Dataset | Method | Backbone | crop size | total epochs | Params(M) | FLOPs(M) | AP | CPrate | Config | Download | | ||
| :-----: | :-------------: | :----------: | :-------: | :----------: | :-------: | :------: | :--: | :--------------------------------------------------------: | :---------------------------------------------------------------: | :--------------------------: | | ||
| COCO | TopDown HeatMap | DCFFResNet50 | 256x192 | 300 | 26.95 | 4290 | 68.3 | \[0.0\] + \[0.2, 0.2, 0.1\] * 10 + \[0.15, 0.15, 0.1\] * 6 | [config](../../mmpose/dcff/dcff_topdown_heatmap_resnet50_coco.py) | [model](<>) \| \[log\] (\<>) | | ||
|
||
## Citation | ||
|
||
```latex | ||
@article{lin2021training, | ||
title={Training Compact CNNs for Image Classification using Dynamic-coded Filter Fusion}, | ||
author={Lin, Mingbao and Ji, Rongrong and Chen, Bohong and Chao, Fei and Liu, Jianzhuang and Zeng, Wei and Tian, Yonghong and Tian, Qi}, | ||
journal={arXiv preprint arXiv:2107.06916}, | ||
year={2021} | ||
} | ||
``` | ||
|
||
## Getting Started | ||
|
||
### Generate channel_config file | ||
|
||
Generate `resnet_cls.json` with `tools/get_channel_units.py`. | ||
|
||
```bash | ||
python tools/get_channel_units.py | ||
configs/pruning/mmcls/dcff/dcff_resnet50_8xb32_in1k.py \ | ||
-c -i --output-path=configs/pruning/mmcls/dcff/resnet_cls.json | ||
``` | ||
|
||
Then set layers' pruning rates `target_pruning_ratio` by `resnet_cls.json`. | ||
|
||
### Train DCFF | ||
|
||
#### Classification | ||
|
||
##### ImageNet | ||
|
||
```bash | ||
sh tools/slurm_train.sh $PARTITION $JOB_NAME \ | ||
configs/pruning/mmcls/dcff/dcff_resnet50_8xb32_in1k.py \ | ||
$WORK_DIR | ||
``` | ||
|
||
### Test DCFF | ||
|
||
#### Classification | ||
|
||
##### ImageNet | ||
|
||
```bash | ||
sh tools/slurm_test.sh $PARTITION $JOB_NAME \ | ||
configs/pruning/mmcls/dcff/dcff_compact_resnet50_8xb32_in1k.py \ | ||
$WORK_DIR | ||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
_base_ = ['dcff_resnet_8xb32_in1k.py'] | ||
|
||
# model settings | ||
model = _base_.model | ||
model['is_deployed'] = True |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
_base_ = [ | ||
'mmcls::_base_/datasets/imagenet_bs32.py', | ||
'mmcls::_base_/schedules/imagenet_bs256.py', | ||
'mmcls::_base_/default_runtime.py' | ||
] | ||
|
||
stage_ratio_1 = 0.65 | ||
stage_ratio_2 = 0.6 | ||
stage_ratio_3 = 0.9 | ||
stage_ratio_4 = 0.7 | ||
|
||
# the config template of target_pruning_ratio can be got by | ||
# python ./tools/get_channel_units.py {config_file} --choice | ||
target_pruning_ratio = { | ||
'backbone.layer1.0.conv1_(0, 64)_64': stage_ratio_1, | ||
'backbone.layer1.0.conv2_(0, 64)_64': stage_ratio_2, | ||
'backbone.layer1.0.conv3_(0, 256)_256': stage_ratio_3, | ||
'backbone.layer1.1.conv1_(0, 64)_64': stage_ratio_1, | ||
'backbone.layer1.1.conv2_(0, 64)_64': stage_ratio_2, | ||
'backbone.layer1.2.conv1_(0, 64)_64': stage_ratio_1, | ||
'backbone.layer1.2.conv2_(0, 64)_64': stage_ratio_2, | ||
# block 1 [0.65, 0.6] downsample=[0.9] | ||
'backbone.layer2.0.conv1_(0, 128)_128': stage_ratio_1, | ||
'backbone.layer2.0.conv2_(0, 128)_128': stage_ratio_2, | ||
'backbone.layer2.0.conv3_(0, 512)_512': stage_ratio_3, | ||
'backbone.layer2.1.conv1_(0, 128)_128': stage_ratio_1, | ||
'backbone.layer2.1.conv2_(0, 128)_128': stage_ratio_2, | ||
'backbone.layer2.2.conv1_(0, 128)_128': stage_ratio_1, | ||
'backbone.layer2.2.conv2_(0, 128)_128': stage_ratio_2, | ||
'backbone.layer2.3.conv1_(0, 128)_128': stage_ratio_1, | ||
'backbone.layer2.3.conv2_(0, 128)_128': stage_ratio_2, | ||
# block 2 [0.65, 0.6] downsample=[0.9] | ||
'backbone.layer3.0.conv1_(0, 256)_256': stage_ratio_1, | ||
'backbone.layer3.0.conv2_(0, 256)_256': stage_ratio_2, | ||
'backbone.layer3.0.conv3_(0, 1024)_1024': stage_ratio_3, | ||
'backbone.layer3.1.conv1_(0, 256)_256': stage_ratio_1, | ||
'backbone.layer3.1.conv2_(0, 256)_256': stage_ratio_2, | ||
'backbone.layer3.2.conv1_(0, 256)_256': stage_ratio_1, | ||
'backbone.layer3.2.conv2_(0, 256)_256': stage_ratio_2, | ||
'backbone.layer3.3.conv1_(0, 256)_256': stage_ratio_4, | ||
'backbone.layer3.3.conv2_(0, 256)_256': stage_ratio_4, | ||
'backbone.layer3.4.conv1_(0, 256)_256': stage_ratio_4, | ||
'backbone.layer3.4.conv2_(0, 256)_256': stage_ratio_4, | ||
'backbone.layer3.5.conv1_(0, 256)_256': stage_ratio_4, | ||
'backbone.layer3.5.conv2_(0, 256)_256': stage_ratio_4, | ||
# block 3 [0.65, 0.6]*2+[0.7, 0.7]*2 downsample=[0.9] | ||
'backbone.layer4.0.conv1_(0, 512)_512': stage_ratio_4, | ||
'backbone.layer4.0.conv2_(0, 512)_512': stage_ratio_4, | ||
'backbone.layer4.0.conv3_(0, 2048)_2048': stage_ratio_3, | ||
'backbone.layer4.1.conv1_(0, 512)_512': stage_ratio_4, | ||
'backbone.layer4.1.conv2_(0, 512)_512': stage_ratio_4, | ||
'backbone.layer4.2.conv1_(0, 512)_512': stage_ratio_4, | ||
'backbone.layer4.2.conv2_(0, 512)_512': stage_ratio_4 | ||
# block 4 [0.7, 0.7] downsample=[0.9] | ||
} | ||
|
||
optim_wrapper = dict( | ||
optimizer=dict(type='SGD', lr=0.1, momentum=0.9, weight_decay=0.0001)) | ||
param_scheduler = dict( | ||
type='MultiStepLR', by_epoch=True, milestones=[30, 60, 90], gamma=0.1) | ||
train_cfg = dict(by_epoch=True, max_epochs=120, val_interval=1) | ||
|
||
data_preprocessor = {'type': 'mmcls.ClsDataPreprocessor'} | ||
|
||
# model settings | ||
model = dict( | ||
_scope_='mmrazor', | ||
type='DCFF', | ||
architecture=dict( | ||
cfg_path='mmcls::resnet/resnet50_8xb32_in1k.py', pretrained=False), | ||
mutator_cfg=dict( | ||
type='DCFFChannelMutator', | ||
channel_unit_cfg=dict( | ||
type='DCFFChannelUnit', default_args=dict(choice_mode='ratio')), | ||
parse_cfg=dict( | ||
type='BackwardTracer', | ||
loss_calculator=dict(type='ImageClassifierPseudoLoss'))), | ||
target_pruning_ratio=target_pruning_ratio, | ||
step_freq=1, | ||
linear_schedule=False, | ||
is_deployed=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This README should be split and paced in mmcls/mmdet/mmseg/mmpose, respectively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. For convenience the empirical statistics are not split.