Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor] Support to use "split" to specify training set/validation set in the ImageNet dataset #1535

Merged
merged 21 commits into from
Jun 2, 2023

Conversation

zzc98
Copy link
Contributor

@zzc98 zzc98 commented Apr 28, 2023

Motivation

  • Support to use split to specify training set/validation set in the ImageNet dataset
  • Support to use split to specify training set/test set in the Mnist/Fashion-Mnist dataset
  • Support to use split to specify training set/validation set in the VOC dataset
  • Refactor SUN397 datasets
  • Update docs/en/user_guides/dataset_prepare.md and docs/zh_CN/user_guides/dataset_prepare.md

Examples

# ImageNet
>>> from mmpretrain.datasets import ImageNet
>>> train_dataset = ImageNet(data_root='data/imagenet', split='train')
>>> train_dataset
Dataset ImageNet
	Number of samples:  1281167
	Number of categories:       1000
	Root of dataset:    data/imagenet
>>> test_dataset = ImageNet(data_root='data/imagenet', split='val')
>>> test_dataset
Dataset ImageNet
	Number of samples:  50000
	Number of categories:       1000
	Root of dataset:    data/imagenet

# ImageNet21K
>>> from mmpretrain.datasets import ImageNet21k
>>> train_dataset = ImageNet21k(data_root='data/imagenet21k', split='train')
>>> train_dataset
Dataset ImageNet21k
	Number of samples:  14197088
	Annotation file:    data/imagenet21k/meta/train.txt
	Prefix of images:   data/imagenet21k/train
    
# VOC
>>> from mmpretrain.datasets import VOC
>>> train_dataset = VOC(data_root='data/VOC2007', split='trainval')
>>> train_dataset
Dataset VOC
	Number of samples:  5011
	Number of categories:       20
	Prefix of dataset:  data/VOC2007
	Path of image set:  data/VOC2007/ImageSets/Main/trainval.txt
	Prefix of images:   data/VOC2007/JPEGImages
	Prefix of annotations:      data/VOC2007/Annotations
>>> test_dataset = VOC(data_root='data/VOC2007', split='test')
>>> test_dataset
Dataset VOC
	Number of samples:  4952
	Number of categories:       20
	Prefix of dataset:  data/VOC2007
	Path of image set:  data/VOC2007/ImageSets/Main/test.txt
	Prefix of images:   data/VOC2007/JPEGImages
	Prefix of annotations:      data/VOC2007/Annotations

# SUN397
>>> from mmpretrain.datasets import SUN397
>>> train_dataset = SUN397(data_root='data/SUN397', split='train')
>>> train_dataset
Dataset SUN397
	Number of samples:  19850
	Number of categories:       397
	Root of dataset:    data/SUN397
>>> test_dataset = SUN397(data_root='data/SUN397', split='test')
>>> test_dataset
Dataset SUN397
	Number of samples:  19850
	Number of categories:       397
	Root of dataset:    data/SUN397

Checklist

Before PR:

  • Pre-commit or other linting tools are used to fix the potential lint issues.
  • Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests.
  • The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
  • The documentation has been modified accordingly, like docstring or example tutorials.

After PR:

  • If the modification has potential influence on downstream or other related projects, this PR should be tested with those projects, like MMDet or MMSeg.
  • CLA has been signed and all committers have signed the CLA in this PR.

@codecov
Copy link

codecov bot commented Apr 28, 2023

Codecov Report

Patch coverage: 26.19% and project coverage change: -0.99 ⚠️

Comparison is base (f9dcae2) 68.16% compared to head (4dba735) 67.18%.

❗ Current head 4dba735 differs from pull request most recent head 30c95ac. Consider uploading reports for the commit 30c95ac to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##              dev    #1535      +/-   ##
==========================================
- Coverage   68.16%   67.18%   -0.99%     
==========================================
  Files         295      303       +8     
  Lines       23372    23958     +586     
  Branches     3713     3798      +85     
==========================================
+ Hits        15932    16095     +163     
- Misses       6880     7290     +410     
- Partials      560      573      +13     
Flag Coverage Δ
unittests 67.18% <26.19%> (-0.99%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
mmpretrain/apis/image_retrieval.py 21.42% <ø> (ø)
mmpretrain/datasets/__init__.py 70.27% <0.00%> (-4.02%) ⬇️
mmpretrain/datasets/gqa_dataset.py 0.00% <0.00%> (ø)
mmpretrain/datasets/nocaps.py 0.00% <0.00%> (ø)
mmpretrain/datasets/scienceqa.py 0.00% <ø> (ø)
mmpretrain/models/multimodal/__init__.py 40.00% <0.00%> (-4.45%) ⬇️
...retrain/models/multimodal/chinese_clip/__init__.py 0.00% <0.00%> (ø)
mmpretrain/models/multimodal/chinese_clip/bert.py 0.00% <0.00%> (ø)
...ain/models/multimodal/chinese_clip/chinese_clip.py 0.00% <0.00%> (ø)
mmpretrain/models/multimodal/chinese_clip/utils.py 0.00% <0.00%> (ø)
... and 17 more

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@zzc98 zzc98 force-pushed the refactor-imagenet branch 2 times, most recently from 1a2f9b2 to 82d6c8f Compare May 22, 2023 06:02
Copy link
Collaborator

@Ezra-Yu Ezra-Yu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@fangyixiao18 fangyixiao18 merged commit bc3c4a3 into open-mmlab:dev Jun 2, 2023
6 of 7 checks passed
@zzc98 zzc98 deleted the refactor-imagenet branch June 2, 2023 03:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants