[Feature] Add some scripts for development. #1257

mzr1996 · 2022-12-12T06:49:38Z

Motivation

Add some tools for help development.

Modification

ckpt_tree.py: To print the model structure with only state dict. It's helpful when you need to compare the keys of two state dicts.

For example:

python .dev_scripts/ckpt_tree.py ckpt/resnet50_8xb32_in1k_20210831-ea4938fc.pth --depth 3

output:

compare_init.py: To compare the weight distribution between two state dicts. It will use the Kolmogorov-Smirnov test to compare the same key between two parameters. It's helpful when you need to align the initialization method with the official implementation.

For example:

python .dev_scripts/compare_init.py resnet1_init.pth resnet2_init.pth --show

generate_readme.py: To generate a README.md from the metafile.

For example:

python .dev_scripts/generate_readme.py configs/beit/metafile.yml

Output:

# BEiT

> [BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254)
<!-- [ALGORITHM] -->

## Abstract

We introduce a self-supervised vision representation model BEiT, which stands for Bidirectional Encoder representation from Image Transformers. Following BERT developed in the natural language processing area, we propose a masked image modeling task to pretrain vision Transformers. Specifically, each image has two views in our pre-training, i.e, image patches (such as 16x16 pixels), and visual tokens (i.e., discrete tokens). We first "tokenize" the original image into visual tokens. Then we randomly mask some image patches and fed them into the backbone Transformer. The pre-training objective is to recover the original visual tokens based on the corrupted image patches. After pre-training BEiT, we directly fine-tune the model parameters on downstream tasks by appending task layers upon the pretrained encoder. Experimental results on image classification and semantic segmentation show that our model achieves competitive results with previous pre-training methods. For example, base-size BEiT achieves 83.2% top-1 accuracy on ImageNet-1K, significantly outperforming from-scratch DeiT training (81.8%) with the same setup. Moreover, large-size BEiT obtains 86.3% only using ImageNet-1K, even outperforming ViT-L with supervised pre-training on ImageNet-22K (85.2%). The code and pretrained models are available at https://aka.ms/beit.

<div align=center>
<img src="" width="50%"/>
</div>

## Results and models

### ImageNet-1k

|         Model         |  Pretrain  | Params(M) | Flops(G) | Top-1 (%) | Top-5 (%) | Config | Download |
|:---------------------:|:----------:|:---------:|:--------:|:---------:|:---------:|:------:|:--------:|
| beit-base_3rdparty_in1k\* | From scratch | 86.53 | 17.58 | 85.28 | 97.59 | [config](./beit-base-p16_8xb64_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/beit/beit-base_3rdparty_in1k_20221114-c0a4df23.pth) |

*Models with \* are converted from the [official repo](https://github.com/microsoft/unilm/tree/master/beit). The config files of these models are only for inference. We don't ensure these config files' training accuracy and welcome you to contribute your reproduction results.*

## Citation

\```bibtex

\```

BC-breaking (Optional)

Does the modification introduce changes that break the backward compatibility of the downstream repositories?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.

Use cases (Optional)

If this PR introduces a new feature, it is better to list some use cases here and update the documentation.

Checklist

Before PR:

Pre-commit or other linting tools are used to fix the potential lint issues.
Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests.
The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
The documentation has been modified accordingly, like docstring or example tutorials.

After PR:

If the modification has potential influence on downstream or other related projects, this PR should be tested with those projects, like MMDet or MMSeg.
CLA has been signed and all committers have signed the CLA in this PR.

Ezra-Yu

LGTM.

mzr1996 added 2 commits December 9, 2022 17:09

[Feature] Add some scripts for development.

6b9579e

Add generate_readme.py.

c05b4a3

mzr1996 requested a review from Ezra-Yu December 12, 2022 06:49

Ezra-Yu approved these changes Dec 13, 2022

View reviewed changes

Update according to comments

a9da64d

mzr1996 force-pushed the new-scripts branch from 05ffcca to a9da64d Compare December 19, 2022 05:53

mzr1996 merged commit 0e41636 into open-mmlab:dev-1.x Dec 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add some scripts for development. #1257

[Feature] Add some scripts for development. #1257

mzr1996 commented Dec 12, 2022

Ezra-Yu left a comment

[Feature] Add some scripts for development. #1257

[Feature] Add some scripts for development. #1257

Conversation

mzr1996 commented Dec 12, 2022

Motivation

Modification

BC-breaking (Optional)

Use cases (Optional)

Checklist

Ezra-Yu left a comment

Choose a reason for hiding this comment