Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix performance issue in convnext DDP train #1098

Merged
merged 1 commit into from
Oct 17, 2022

Conversation

cybergeek2077
Copy link

@cybergeek2077 cybergeek2077 commented Oct 16, 2022

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

There are flowing warning when use dist_train.sh to train convnext

[W reducer.cpp:347] Warning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed. This is not an error, but may impair performance.

Modification

call contiguous after permute in layernorm, and the performance actually improve 3x in my test

BC-breaking (Optional)

Does the modification introduce changes that break the backward compatibility of the downstream repositories?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.

Use cases (Optional)

If this PR introduces a new feature, it is better to list some use cases here and update the documentation.

Checklist

Before PR:

  • Pre-commit or other linting tools are used to fix the potential lint issues.
  • Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests.
  • The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
  • The documentation has been modified accordingly, like docstring or example tutorials.

After PR:

  • If the modification has potential influence on downstream or other related projects, this PR should be tested with those projects, like MMDet or MMSeg.
  • CLA has been signed and all committers have signed the CLA in this PR.

 to fix performance issue in convnext DDP train
@CLAassistant
Copy link

CLAassistant commented Oct 16, 2022

CLA assistant check
All committers have signed the CLA.

Copy link
Member

@mzr1996 mzr1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I have tested it and this modification can slightly accelerate the training.

@mzr1996 mzr1996 changed the base branch from master to dev October 17, 2022 02:09
@mzr1996 mzr1996 merged commit 38040d5 into open-mmlab:dev Oct 17, 2022
@kamzero
Copy link

kamzero commented Feb 7, 2023

the performance actually improve 3x in my test

Hi! May I ask whether this warning only affects the training speed and convergence speed, or will it affect the accuracy?

@cybergeek2077
Copy link
Author

the performance actually improve 3x in my test

Hi! May I ask whether this warning only affects the training speed and convergence speed, or will it affect the accuracy?

I do not do the ablation experiment, but the model i trained before fixing the bug works normally, so i think it only affects the training speed.

@OpenMMLab-Assistant005
Copy link

Hi @790475019 !First of all, we want to express our gratitude for your significant PR in the project. Your contribution is highly appreciated, and we are grateful for your efforts in helping improve this open-source project during your personal time. We believe that many developers will benefit from your PR.

We would also like to invite you to join our Special Interest Group (SIG) private channel on Discord, where you can share your experiences, ideas, and build connections with like-minded peers. To join the SIG channel, simply message moderator— OpenMMLab on Discord or briefly share your open-source contributions in the #introductions channel and we will assist you. Look forward to seeing you there! Join us :https://discord.gg/UjgXkPWNqA

If you have WeChat account,welcome to join our community on WeChat. You can add our assistant :openmmlabwx. Please add "mmsig + Github ID" as a remark when adding friends:)
Thank you again for your contribution❤

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants