Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

Auto pruners #2490

Merged
merged 107 commits into from
Jun 30, 2020
Merged

Auto pruners #2490

merged 107 commits into from
Jun 30, 2020

Conversation

suiguoxin
Copy link
Member

Add algo implementation / examples / test / doc for the following pruning algos:

  • NetAdapt
  • SimulatedAnnealing
  • ADMM
  • AutoCompress

- **trainer:** Function used for the first optimization subproblem.
This function should include `model, optimizer, criterion, epoch, callback` as parameters, where callback should be inserted after loss.backward of the normal training process.
- **optimize_iteration:** ADMM optimize iterations.
- **training_epochs:** training epochs of the first optimization subproblem.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not clear what is "the first optimization subproblem". better to give a little more description in the introduction of this pruner

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

- **optimize_iteration:** ADMM optimize iterations.
- **training_epochs:** training epochs of the first optimization subproblem.
- **row:** penalty parameters for ADMM training.
- **base_algo:** base pruning algorithm. 'level', 'l1' or 'l2', by default 'l1'.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this one does not have experiment_data_dir?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ADMMPruner is not an auto pruner, there is thus no experiment data generated. More explanation on what is included as experiment data added for the auto pruners.



## AutoCompress Pruner
For each round t, AutoCompressPruner prune the model for the same sparsity each round to achive the ovrall sparsity:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ovrall -> overall

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

- **sparsity:** How much percentage of convolutional filters are to be pruned.
- **op_types:** "Conv2d" or "default".
- **trainer:** Function used for the first optimization subproblem.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not clear how to write trainer. who should provide callback? what is the reason to provide callback? why it should be put behind loss.backward?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

This function should include `model, optimizer, criterion, epoch, callback` as parameters, where callback should be inserted after loss.backward of the normal training process.
- **evaluator:** Function to evaluate the masked model. This function should include `model` as the only parameter, and returns a scalar value.
- **dummy_input:** The dummy input for model speed up, users should put it on right device before pass in.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why there is model speed up here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

speedup is called inside AutoCompress to keep the model un-masked and realize real pruning after each iteration.

- **dummy_input:** The dummy input for model speed up, users should put it on right device before pass in.
- **iterations:** The number of overall iterations.
- **optimize_mode:** Optimize mode, 'maximize' or 'minimize', by default 'maximize'.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only this auto pruner supports optimize_mode?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optimize_mode is supported in NetAdapt SimualatedAnnealing and AucoCompress. Sorry for have missed this arg for NetAdaptPruner.

- **cool_down_rate:** Simualated Annealing related parameter.
- **perturbation_magnitude:** Initial perturbation magnitude to the sparsities. The magnitude decreases with current temperature.
- **optimize_iteration:** ADMM optimize iterations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the relation with ADMM?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoCompress Pruner call SimualtedAnnealing Pruner and ADMM Pruner iteratively.

- **perturbation_magnitude:** Initial perturbation magnitude to the sparsities. The magnitude decreases with current temperature.
- **optimize_iteration:** ADMM optimize iterations.
- **epochs:** training epochs of the first optimization subproblem.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one also has two subproblems?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are args for ADMM

"""
_logger.info('Starting AutoCompress pruning...')

sparsity_each_round = 1 - pow(1-self._sparsity, 1/self._optimize_iterations)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use this sparsity strategy?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This strategy is used to ensure that same number of weights will be pruned in each iteration.

1. Con = Res_i - delta_Res
2. for every layer:
Choose Num Filters to prune
Choose which filter to prunee
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prunee -> prune

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

and fine tune the model for a short term after each pruning iteration.
optimize_mode : str
optimize mode, 'maximize' or 'minimize', by default 'maximize'
base_algo : str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to add a description that we use base_algo to choose which filter to prune

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@@ -398,5 +402,176 @@ We try to reproduce the experiment result of the fully connected network on MNIS
The above figure shows the result of the fully connected network. `round0-sparsity-0.0` is the performance without pruning. Consistent with the paper, pruning around 80% also obtain similar performance compared to non-pruning, and converges a little faster. If pruning too much, e.g., larger than 94%, the accuracy becomes lower and convergence becomes a little slower. A little different from the paper, the trend of the data in the paper is relatively more clear.


## NetAdapt Pruner
Copy link
Contributor

@chicm-ms chicm-ms Jun 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order of each section is better consistent with the content directory/list at beginning.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


# use speed up to prune the model before next iteration, because SimulatedAnnealingPruner & ADMMPruner don't take masked models
self._model_to_prune.load_state_dict(torch.load(os.path.join(
self._experiment_data_dir, 'model_admm_masked.pth')))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why reload the checkpoint?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model weights have changed after admm pruner.

Penalty parameters for ADMM training.
base_algo : str
Base pruning algorithm. `level`, `l1` or `l2`, by default `l1`.
Given the sparsity distrution among the ops, the assigned `base_algo` is used to decide which filters/channels/weights to prune.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

distrution -> distribution

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@chicm-ms chicm-ms merged commit f5caa19 into microsoft:master Jun 30, 2020
@chicm-ms chicm-ms mentioned this pull request Jul 1, 2020
24 tasks
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants