[Refactor] Support to use "split" to specify training set/validation set in the ImageNet dataset #1535

zzc98 · 2023-04-28T07:29:37Z

Motivation

Support to use split to specify training set/validation set in the ImageNet dataset
Support to use split to specify training set/test set in the Mnist/Fashion-Mnist dataset
Support to use split to specify training set/validation set in the VOC dataset
Refactor SUN397 datasets
Update docs/en/user_guides/dataset_prepare.md and docs/zh_CN/user_guides/dataset_prepare.md

Examples

# ImageNet
>>> from mmpretrain.datasets import ImageNet
>>> train_dataset = ImageNet(data_root='data/imagenet', split='train')
>>> train_dataset
Dataset ImageNet
	Number of samples:  1281167
	Number of categories:       1000
	Root of dataset:    data/imagenet
>>> test_dataset = ImageNet(data_root='data/imagenet', split='val')
>>> test_dataset
Dataset ImageNet
	Number of samples:  50000
	Number of categories:       1000
	Root of dataset:    data/imagenet

# ImageNet21K
>>> from mmpretrain.datasets import ImageNet21k
>>> train_dataset = ImageNet21k(data_root='data/imagenet21k', split='train')
>>> train_dataset
Dataset ImageNet21k
	Number of samples:  14197088
	Annotation file:    data/imagenet21k/meta/train.txt
	Prefix of images:   data/imagenet21k/train
    
# VOC
>>> from mmpretrain.datasets import VOC
>>> train_dataset = VOC(data_root='data/VOC2007', split='trainval')
>>> train_dataset
Dataset VOC
	Number of samples:  5011
	Number of categories:       20
	Prefix of dataset:  data/VOC2007
	Path of image set:  data/VOC2007/ImageSets/Main/trainval.txt
	Prefix of images:   data/VOC2007/JPEGImages
	Prefix of annotations:      data/VOC2007/Annotations
>>> test_dataset = VOC(data_root='data/VOC2007', split='test')
>>> test_dataset
Dataset VOC
	Number of samples:  4952
	Number of categories:       20
	Prefix of dataset:  data/VOC2007
	Path of image set:  data/VOC2007/ImageSets/Main/test.txt
	Prefix of images:   data/VOC2007/JPEGImages
	Prefix of annotations:      data/VOC2007/Annotations

# SUN397
>>> from mmpretrain.datasets import SUN397
>>> train_dataset = SUN397(data_root='data/SUN397', split='train')
>>> train_dataset
Dataset SUN397
	Number of samples:  19850
	Number of categories:       397
	Root of dataset:    data/SUN397
>>> test_dataset = SUN397(data_root='data/SUN397', split='test')
>>> test_dataset
Dataset SUN397
	Number of samples:  19850
	Number of categories:       397
	Root of dataset:    data/SUN397

Checklist

Before PR:

Pre-commit or other linting tools are used to fix the potential lint issues.
Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests.
The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
The documentation has been modified accordingly, like docstring or example tutorials.

After PR:

If the modification has potential influence on downstream or other related projects, this PR should be tested with those projects, like MMDet or MMSeg.
CLA has been signed and all committers have signed the CLA in this PR.

codecov · 2023-04-28T07:40:29Z

Codecov Report

Patch coverage: 26.19% and project coverage change: -0.99 ⚠️

Comparison is base (f9dcae2) 68.16% compared to head (4dba735) 67.18%.

❗ Current head 4dba735 differs from pull request most recent head 30c95ac. Consider uploading reports for the commit 30c95ac to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##              dev    #1535      +/-   ##
==========================================
- Coverage   68.16%   67.18%   -0.99%     
==========================================
  Files         295      303       +8     
  Lines       23372    23958     +586     
  Branches     3713     3798      +85     
==========================================
+ Hits        15932    16095     +163     
- Misses       6880     7290     +410     
- Partials      560      573      +13

Flag	Coverage Δ
unittests	`67.18% <26.19%> (-0.99%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
mmpretrain/apis/image_retrieval.py	`21.42% <ø> (ø)`
mmpretrain/datasets/__init__.py	`70.27% <0.00%> (-4.02%)`	⬇️
mmpretrain/datasets/gqa_dataset.py	`0.00% <0.00%> (ø)`
mmpretrain/datasets/nocaps.py	`0.00% <0.00%> (ø)`
mmpretrain/datasets/scienceqa.py	`0.00% <ø> (ø)`
mmpretrain/models/multimodal/__init__.py	`40.00% <0.00%> (-4.45%)`	⬇️
...retrain/models/multimodal/chinese_clip/__init__.py	`0.00% <0.00%> (ø)`
mmpretrain/models/multimodal/chinese_clip/bert.py	`0.00% <0.00%> (ø)`
...ain/models/multimodal/chinese_clip/chinese_clip.py	`0.00% <0.00%> (ø)`
mmpretrain/models/multimodal/chinese_clip/utils.py	`0.00% <0.00%> (ø)`
... and 17 more

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

mmpretrain/datasets/imagenet.py

docs/en/user_guides/dataset_prepare.md

Ezra-Yu

LGTM.

fangyixiao18 requested changes May 5, 2023

View reviewed changes

mmpretrain/datasets/imagenet.py Show resolved Hide resolved

zzc98 force-pushed the refactor-imagenet branch from 38d745b to b770e09 Compare May 6, 2023 06:18

fangyixiao18 requested changes May 9, 2023

View reviewed changes

docs/en/user_guides/dataset_prepare.md Show resolved Hide resolved

zzc98 force-pushed the refactor-imagenet branch 2 times, most recently from 1a2f9b2 to 82d6c8f Compare May 22, 2023 06:02

Ezra-Yu approved these changes May 22, 2023

View reviewed changes

YuanLiuuuuuu and others added 20 commits May 25, 2023 17:05

[Feature]: Add caption

c5ea2cd

[Feature]: Update scienceqa

a909e47

[CI] Add test mim CI. (open-mmlab#879)

0af6e7f

refactor imagenet dataset

a20a283

refactor imagenet dataset

9fd277f

refactor imagenet dataset

869f0a9

update imagenet21k

b3deb9d

update configs

84bd8be

update mnist

3c6a1a0

update dataset_prepare.md

206652f

fix sun397 url and update user_guides/dataset_prepare.md

5832647

update dataset_prepare.md

c49e8a5

fix sun397 dataset

a960bc7

fix sun397

2ffd48c

update chinese dataset_prepare.md

8b9e294

update dataset_prepare.md

6ab1b31

[Refactor] update voc dataset

3f0900a

[Refactor] update voc dataset

11be539

refactor imagenet

1627a7f

refactor imagenet

7bc99e4

zzc98 force-pushed the refactor-imagenet branch from 026e52b to 7bc99e4 Compare May 25, 2023 09:11

use mmengine.fileio

30c95ac

mzr1996 approved these changes Jun 2, 2023

View reviewed changes

fangyixiao18 approved these changes Jun 2, 2023

View reviewed changes

fangyixiao18 merged commit bc3c4a3 into open-mmlab:dev Jun 2, 2023
6 of 7 checks passed

zzc98 deleted the refactor-imagenet branch June 2, 2023 03:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Refactor] Support to use "split" to specify training set/validation set in the ImageNet dataset #1535

[Refactor] Support to use "split" to specify training set/validation set in the ImageNet dataset #1535

zzc98 commented Apr 28, 2023 •

edited

Loading

codecov bot commented Apr 28, 2023 •

edited

Loading

Ezra-Yu left a comment

[Refactor] Support to use "split" to specify training set/validation set in the ImageNet dataset #1535

[Refactor] Support to use "split" to specify training set/validation set in the ImageNet dataset #1535

Conversation

zzc98 commented Apr 28, 2023 • edited Loading

Motivation

Examples

Checklist

codecov bot commented Apr 28, 2023 • edited Loading

Codecov Report

Ezra-Yu left a comment

Choose a reason for hiding this comment

zzc98 commented Apr 28, 2023 •

edited

Loading

codecov bot commented Apr 28, 2023 •

edited

Loading