Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add some scripts for development. #1257

Merged
merged 3 commits into from
Dec 19, 2022

Conversation

mzr1996
Copy link
Member

@mzr1996 mzr1996 commented Dec 12, 2022

Motivation

Add some tools for help development.

Modification

  1. ckpt_tree.py: To print the model structure with only state dict. It's helpful when you need to compare the keys of two state dicts.

For example:

python .dev_scripts/ckpt_tree.py ckpt/resnet50_8xb32_in1k_20210831-ea4938fc.pth --depth 3

output:
图片

  1. compare_init.py: To compare the weight distribution between two state dicts. It will use the Kolmogorov-Smirnov test to compare the same key between two parameters. It's helpful when you need to align the initialization method with the official implementation.

For example:

python .dev_scripts/compare_init.py resnet1_init.pth resnet2_init.pth --show

图片

图片

  1. generate_readme.py: To generate a README.md from the metafile.

For example:

python .dev_scripts/generate_readme.py configs/beit/metafile.yml

Output:

# BEiT

> [BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254)
<!-- [ALGORITHM] -->

## Abstract

We introduce a self-supervised vision representation model BEiT, which stands for Bidirectional Encoder representation from Image Transformers. Following BERT developed in the natural language processing area, we propose a masked image modeling task to pretrain vision Transformers. Specifically, each image has two views in our pre-training, i.e, image patches (such as 16x16 pixels), and visual tokens (i.e., discrete tokens). We first "tokenize" the original image into visual tokens. Then we randomly mask some image patches and fed them into the backbone Transformer. The pre-training objective is to recover the original visual tokens based on the corrupted image patches. After pre-training BEiT, we directly fine-tune the model parameters on downstream tasks by appending task layers upon the pretrained encoder. Experimental results on image classification and semantic segmentation show that our model achieves competitive results with previous pre-training methods. For example, base-size BEiT achieves 83.2% top-1 accuracy on ImageNet-1K, significantly outperforming from-scratch DeiT training (81.8%) with the same setup. Moreover, large-size BEiT obtains 86.3% only using ImageNet-1K, even outperforming ViT-L with supervised pre-training on ImageNet-22K (85.2%). The code and pretrained models are available at https://aka.ms/beit.

<div align=center>
<img src="" width="50%"/>
</div>

## Results and models

### ImageNet-1k

|         Model         |  Pretrain  | Params(M) | Flops(G) | Top-1 (%) | Top-5 (%) | Config | Download |
|:---------------------:|:----------:|:---------:|:--------:|:---------:|:---------:|:------:|:--------:|
| beit-base_3rdparty_in1k\* | From scratch | 86.53 | 17.58 | 85.28 | 97.59 | [config](./beit-base-p16_8xb64_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/beit/beit-base_3rdparty_in1k_20221114-c0a4df23.pth) |

*Models with \* are converted from the [official repo](https://github.com/microsoft/unilm/tree/master/beit). The config files of these models are only for inference. We don't ensure these config files' training accuracy and welcome you to contribute your reproduction results.*

## Citation

\```bibtex

\```

BC-breaking (Optional)

Does the modification introduce changes that break the backward compatibility of the downstream repositories?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.

Use cases (Optional)

If this PR introduces a new feature, it is better to list some use cases here and update the documentation.

Checklist

Before PR:

  • Pre-commit or other linting tools are used to fix the potential lint issues.
  • Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests.
  • The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
  • The documentation has been modified accordingly, like docstring or example tutorials.

After PR:

  • If the modification has potential influence on downstream or other related projects, this PR should be tested with those projects, like MMDet or MMSeg.
  • CLA has been signed and all committers have signed the CLA in this PR.

@mzr1996 mzr1996 requested a review from Ezra-Yu December 12, 2022 06:49
Copy link
Collaborator

@Ezra-Yu Ezra-Yu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@mzr1996 mzr1996 merged commit 0e41636 into open-mmlab:dev-1.x Dec 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants