Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add schedule&runtime tutorial doc #499

Merged
merged 33 commits into from
Nov 17, 2021
Merged

Conversation

Ezra-Yu
Copy link
Collaborator

@Ezra-Yu Ezra-Yu commented Oct 22, 2021

Motivation

Add schedule and runtime tutorial documentation.

Modification

  1. Add schedule and runtime tutorial documentation.
  2. Add api.models.heads link in CN_doc.

Checklist

Before PR:

  • Pre-commit or other linting tools are used to fix the potential lint issues.
  • Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests.
  • The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
  • The documentation has been modified accordingly, like docstring or example tutorials.

After PR:

  • If the modification has potential influence on downstream or other related projects, this PR should be tested with those projects, like MMDet or MMSeg.
  • CLA has been signed and all committers have signed the CLA in this PR.

@codecov
Copy link

codecov bot commented Oct 22, 2021

Codecov Report

Merging #499 (18550a2) into master (dc35eb6) will increase coverage by 0.39%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #499      +/-   ##
==========================================
+ Coverage   79.48%   79.87%   +0.39%     
==========================================
  Files         106      107       +1     
  Lines        5975     6093     +118     
  Branches      968      987      +19     
==========================================
+ Hits         4749     4867     +118     
+ Misses       1095     1094       -1     
- Partials      131      132       +1     
Flag Coverage Δ
unittests 79.87% <ø> (+0.39%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
mmcls/models/backbones/timm_backbone.py 78.94% <0.00%> (-2.01%) ⬇️
mmcls/models/heads/cls_head.py 83.33% <0.00%> (ø)
mmcls/models/backbones/__init__.py 100.00% <0.00%> (ø)
mmcls/models/backbones/mlp_mixer.py 95.45% <0.00%> (ø)
mmcls/models/losses/cross_entropy_loss.py 98.33% <0.00%> (+0.15%) ⬆️
mmcls/apis/inference.py 20.00% <0.00%> (+0.35%) ⬆️
mmcls/datasets/pipelines/transforms.py 88.17% <0.00%> (+1.36%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dc35eb6...18550a2. Read the comment docs.

@Ezra-Yu Ezra-Yu requested a review from mzr1996 October 27, 2021 10:02
Comment on lines 79 to 81
Create the `mmcls/core/optimizer` folder and the `mmcls/core/optimizer/__init__.py` file.
The newly defined module should be imported in `mmcls/core/optimizer/__init__.py` so that the registry will
find the new module and add it:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users should also need to add from .optimizer import * into mmcls/core/__init__.py to register the optimzier.

docs/tutorials/customize_runtime.md Outdated Show resolved Hide resolved
```

The default optimizer constructor is implemented [here](https://github.com/open-mmlab/mmcv/blob/9ecd6b0d5ff9d2172c49a182eaa669e9f27bb8e7/mmcv/runner/optimizer/default_constructor.py#L11),
which could also serve as a template for new optimizer constructor.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DefaultOptimizerConstructor supports paramwise_cfg, please add an example about how to use it in config file.


## Customize Training Schedules

we use step learning rate with default value in config files, this calls [`StepLRHook`](https://github.com/open-mmlab/mmcv/blob/f48241a65aebfe07db122e9db320c31b685dc674/mmcv/runner/hooks/lr_updater.py#L153) in MMCV.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we -> We


so that 1 epoch for training and 1 epoch for validation will be run iteratively.

:::{note}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use

```{note}

]
```

You can also set the priority of the hook by adding key `priority` to `'NORMAL'` or `'HIGHEST'` as below
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'NORMAL' or 'HIGHEST'?
Here is the priority level table, and users can also use a specific value to modify priority finely.

Level Value
HIGHEST 0
VERY_HIGH 10
HIGH 30
ABOVE_NORMAL 40
NORMAL 50
BELOW_NORMAL 60
LOW 70
VERY_LOW 90
LOWEST 100

docs/tutorials/customize_runtime.md Outdated Show resolved Hide resolved
The above-mentioned tutorials already cover how to modify `optimizer_config`, `momentum_config`, and `lr_config`.
Here we reveals how what we can do with `log_config`, `checkpoint_config`, and `evaluation`.

#### Checkpoint config
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rearrange these sections and rename titles according to the above list.

Comment on lines 140 to 144
Some models need gradient clip to clip the gradients to stabilize the training process. An example is as below:

```python
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The optimizer_config will be passed to OptimizerHook, users are easy to confuse it and the optimizer_cfg in the optimizer constructor.
And users can specify different kinds of optimizer hooks, like GradientCumulativeOptimizerHook here. Consider adding some introduction here.

#### Evaluation config

The config of `evaluation` will be used to initialize the [`EvalHook`](https://github.com/open-mmlab/mmclassification/blob/master/mmcls/core/evaluation/eval_hooks.py).
Except the key `interval`, other arguments such as `metrics` will be passed to the `dataset.evaluate()`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The EvaluationHook supports save_best now, many users may want this function, consider adding an introduction about it.

Ezra-Yu and others added 3 commits November 3, 2021 14:55
Co-authored-by: Ma Zerun <mzr1996@163.com>
Co-authored-by: Ma Zerun <mzr1996@163.com>
@Ezra-Yu Ezra-Yu changed the title Add custom runtime tutorial doc Add schedule&runtime tutorial doc Nov 5, 2021

### Warmup strategy

在配置文件中预热(warmup)的逐步学习率调整,主要的参数有以下几个:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing translation


In academic research and industrial practice, it may be necessary to use optimization methods not implemented by MMClassification, and users can add them through the following methods.

```(note)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

```{note} instead of ```(note)

Comment on lines 10 to 12
- [CheckpointSaverHook](#checkpointsaverhook)
- [LoggerHooks](#loggerhooks)
- [EvaluationHook](#evaluationhook)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TOC's links don't match titles.

@@ -0,0 +1,304 @@
# Tutorial 6: Customize Schedule

In this tutorial, we will introduce some methods about how to construct optimizers, customize learning rate and momentum schedules, use multiple learning rates and weight_decay, gradient clipping, gradient accumulation, and customize self-implemented methods for the project.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The underline is only used in variable names.

checkpoint_config = dict(interval=1)
```

The users could set `max_keep_ckpts` to only save only small number of checkpoints or decide whether to store state dict of optimizer by `save_optimizer`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The users can be used in comments, which is for developers. In the documentation, just use we or you.


**After completing your configuration file,you could use [learning rate visualization tool](https://mmclassification.readthedocs.io/zh_CN/latest/tools/visualization.html#id3) to draw the corresponding learning rate adjustment curve.**

## Use multiple learning rates and weight_decays
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Use multiple learning rates and weight_decays
## Parameter-wise finely configuration

Examples are as follows:

```python
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment about norm_type, since it's not self-explanatory.

Comment on lines 186 to 189
When the optimizer hook type is not specified, `OptimizerHook` is used by default, and the above is equivalent to:

```python
optimizer_config = dict(type="OptimizerHook", grad_clip=dict(max_norm=35, norm_type=2))
Copy link
Member

@mzr1996 mzr1996 Nov 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is not relevant to gradient clipping, move it to where users need to change it.


### Gradient accumulation

When computing resources are lacking, BatchSize can only be set to a small value, which affects the effect of the resulting model. Gradient accumulation can be used to circumvent this problem.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BatchSize? Strange uppercase.

Comment on lines 197 to 207
- ConsineAnnealing schedule:

```python
lr_config = dict(
policy='CosineAnnealing',
warmup='linear',
warmup_iters=1000,
warmup_ratio=1.0 / 10,
min_lr_ratio=1e-5)
```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forget to replace the example?

@@ -0,0 +1,265 @@
# Tutorial 7: Customize Runtime Settings

In this tutorial, we will introduce some methods about how to customize optimization methods, training schedules, workflow and hooks when running your own settings for the project.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to customize optimization methods has been moved to tutorial 6


```{note}
1. The parameters of model will not be updated during val epoch.
2. Keyword `total_epochs` in the config only controls the number of training epochs and will not affect the validation workflow.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

total_epochs or max_epochs?


## Customize Workflow

By default, we recommend users to use **`EvaluationHook`** to do evaluation after training epoch, but they can still use `val` workflow as an alternative.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line should be moved after the introduction.
And consider adding a note to remind users that modifying workflow is unnecessary in most situations.

workflow = [('train', 1)]
```

which means running 1 epoch for training.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Watch out for the uppercase.


The hook mechanism is widely used in the OpenMMLab open source algorithm library. Combined with the `Runner`, the entire life cycle of the training process can be managed easily. You can learn more about the hook through [related article](https://www.calltutors.com/blog/what-is-hook/).

Hooks only work when they are registered in the constructor. At present, hooks are mainly divided into two categories:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Hooks only work when they are registered in the constructor. At present, hooks are mainly divided into two categories:
Hooks only work after being registered into the runner. At present, hooks are mainly divided into two categories:

```{note}
1. In the default configuration files of MMClassification, the evaluation field is generally placed in the datasets configs.

2. 'EvalHook' in 'MMClassification/mmcls/core/evaluation/eval_hooks.py' will be deprecated, recommend to use 'EvaluationHook' in MMCV as above.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This note can be removed because config file modification is not relevant to use which implementation.


### Use implemented hooks

Some hooks have been already implemented in MMCV 和 MMClassification:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and


- load_from : only imports model weights, which is mainly used to load pre-trained or trained models;

- resume_from : not only import model weights, but also optimizer information, current epoch information, mainly used to continue training from the breakpoint.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

breakpoint -> checkpoint


- resume_from : not only import model weights, but also optimizer information, current epoch information, mainly used to continue training from the breakpoint.

- init_cfg.Pretrained : load the model weight, and you can specify a specific ‘key’ layer to load.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- init_cfg.Pretrained : load the model weight, and you can specify a specific ‘key’ layer to load.
- init_cfg.Pretrained : Load weights during weight initialization, and you can specify which module to load. This is usually used when fine-tuning a model.

- init_cfg.Pretrained : load the model weight, and you can specify a specific ‘key’ layer to load.

```{note}
It is recommended to specify pre-training weights using init_cfg.Pretrained when fine-tuning the model.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a link to the fine-tuning tutorial.

@Ezra-Yu Ezra-Yu requested a review from mzr1996 November 12, 2021 05:13
@@ -200,6 +200,15 @@ momentum_config = dict(
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
```

当使用继承并修改基础配置方式时,如果基础配置中 `grad_clip=None`,需要添加 `_delete_=True`。有关 `_delete_` 可以参靠[教程 1:如何编写配置文件](https://mmclassification.readthedocs.io/zh_CN/latest/tutorials/config.html#id16)。案列如下:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“参靠”,“案列”

Copy link
Member

@mzr1996 mzr1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mzr1996 mzr1996 merged commit 771d105 into open-mmlab:master Nov 17, 2021
@Ezra-Yu Ezra-Yu deleted the schedule branch July 18, 2022 08:45
mzr1996 added a commit to mzr1996/mmpretrain that referenced this pull request Nov 24, 2022
* add cn tutorials/config.md

* add heads api and doc title link

* Update tutorials index

* Update tutorials index

* Update config.md

* add english version

* Update config.md

* add custom_runtime

* Update docs

* modify title

* modify en to zh_CN in chinses docs

* Update Readme

* fix punctuations

* Update docs/tutorials/customize_runtime.md

Co-authored-by: Ma Zerun <mzr1996@163.com>

* Update docs/tutorials/customize_runtime.md

Co-authored-by: Ma Zerun <mzr1996@163.com>

* split to schedule and runtime

* fix lint

* improve docs after review

* fix TOC

* imporve expersion

* fix an error

* Imporve schedule.md

* Improve runtime.md

* Improve chinese docs.

* Fix toc-tree

* fix en link and add a case of gradient clipping

* fix wrong word

Co-authored-by: Ma Zerun <mzr1996@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants