Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improve] Speed up data preprocessor. #1064

Merged
merged 6 commits into from
Oct 17, 2022

Conversation

mzr1996
Copy link
Member

@mzr1996 mzr1996 commented Sep 30, 2022

Motivation

The original data preprocessor consumes much time because it needs to cast the label tensors of samples one by one.

Modification

  1. In ClsDataPreprocessoor, only cast gt_label of data samples to cuda instead of the whole data sample. And before casting, stack/concat the tensors.
  2. Refactoring the batch augmentations class like Mixup. Now it directly processes the batch inputs and batch scores (one-hot format labels), and won't accept normal labels or data samples.
  3. Add ClsDataSample serialization override functions. During serialization in ForkingPicker, convert all tensors to NumPy array and convert them back during deserialization. This is to decrease the consumption of file descriptors in the dataloader.

Here is the speed comparison in MobileNetV2 (batch size 64)

inference time FPS
Original 19.5 ms 3289
New 14.1 ms 4531

BC-breaking (Optional)

The Mixup, CutMix, and ResizeMix classes won't accept the num_classes argument, and it becomes the argument of ClsDataPreprocessor.

In the config files, the changes are as below:

# --------- Original config --------
model = dict(
    ...
    train_cfg=dict(augments=[
        dict(type='Mixup', alpha=0.2, num_classes=1000),
        dict(type='CutMix', alpha=1.0, num_classes=1000)
    ]),
)

data_preprocessor = dict(
    mean=[123.675, 116.28, 103.53],
    std=[58.395, 57.12, 57.375],
    to_rgb=True,
)

# -------- New config --------
model = dict(
    ...
    train_cfg=dict(augments=[
        dict(type='Mixup', alpha=0.2,
        dict(type='CutMix', alpha=1.0)
    ]),
)

data_preprocessor = dict(
    num_classes=1000,
    mean=[123.675, 116.28, 103.53],
    std=[58.395, 57.12, 57.375],
    to_rgb=True,
)

Use cases (Optional)

If this PR introduces a new feature, it is better to list some use cases here and update the documentation.

Checklist

Before PR:

  • Pre-commit or other linting tools are used to fix the potential lint issues.
  • Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests.
  • The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
  • The documentation has been modified accordingly, like docstring or example tutorials.

After PR:

  • If the modification has potential influence on downstream or other related projects, this PR should be tested with those projects, like MMDet or MMSeg.
  • CLA has been signed and all committers have signed the CLA in this PR.

@mzr1996 mzr1996 requested a review from tonysy September 30, 2022 01:11
@mzr1996 mzr1996 added the 1.0rc Functionalities for MMClassification 1.0rc label Sep 30, 2022
@codecov
Copy link

codecov bot commented Sep 30, 2022

Codecov Report

Base: 0.02% // Head: 91.32% // Increases project coverage by +91.29% 🎉

Coverage data is based on head (08dd17c) compared to base (b8b31e9).
Patch has no changes to coverable lines.

❗ Current head 08dd17c differs from pull request most recent head 833ebe7. Consider uploading reports for the commit 833ebe7 to get more accurate results

Additional details and impacted files
@@             Coverage Diff              @@
##           dev-1.x    #1064       +/-   ##
============================================
+ Coverage     0.02%   91.32%   +91.29%     
============================================
  Files          121      128        +7     
  Lines         8217     9509     +1292     
  Branches      1368     1498      +130     
============================================
+ Hits             2     8684     +8682     
+ Misses        8215      639     -7576     
- Partials         0      186      +186     
Flag Coverage Δ
unittests 91.32% <ø> (+91.29%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
mmcls/apis/inference.py 0.00% <0.00%> (ø)
mmcls/datasets/transforms/compose.py
mmcls/models/backbones/swin_transformer_v2.py 89.47% <0.00%> (ø)
mmcls/models/backbones/efficientformer.py 95.08% <0.00%> (ø)
mmcls/models/heads/efficientformer_head.py 93.10% <0.00%> (ø)
mmcls/models/backbones/edgenext.py 95.20% <0.00%> (ø)
mmcls/models/utils/layer_scale.py 86.66% <0.00%> (ø)
mmcls/models/backbones/mvit.py 92.46% <0.00%> (ø)
mmcls/models/backbones/mobileone.py 94.47% <0.00%> (ø)
mmcls/structures/utils.py 77.77% <0.00%> (ø)
... and 118 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@@ -3,6 +3,7 @@

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line can be removed to keep consistent with other config file

data['inputs'] = inputs

return data
data_samples = data.get('data_samples', None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To help the user for understanding our codebase, maybe we need to update the document by adding some description of the data_samples used in MMClassification(with text or link to MMEngine related document), this could be added into the document TODO list.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can create an issue and add it to Oct. TODO list

Copy link
Collaborator

@tonysy tonysy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mzr1996 mzr1996 merged commit 29f066f into open-mmlab:dev-1.x Oct 17, 2022
mzr1996 added a commit to mzr1996/mmpretrain that referenced this pull request Nov 24, 2022
* [Improve] Speed up data preprocessor.

* Add ClsDataSample serialization override functions.

* Add unit tests

* Modify configs to fit new mixup args.

* Fix `num_classes` of the ImageNet-21k config.

* Update docs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.0rc Functionalities for MMClassification 1.0rc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants