Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] FSDP: add auto_wrap_bn #531

Merged
merged 2 commits into from
Mar 18, 2021
Merged

[feat] FSDP: add auto_wrap_bn #531

merged 2 commits into from
Mar 18, 2021

Conversation

min-xu-ai
Copy link
Contributor

  • add an utility function to handle wrapping of BN

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
  • Did you read the contributor guideline?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

What does this PR do?

Fixes # (issue).

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

- add an utility function to handle wrapping of BN
@min-xu-ai min-xu-ai requested a review from myleott March 18, 2021 04:33
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 18, 2021
return not isinstance(module, tuple(default_auto_wrap_policy.FORCE_LEAF_MODULES)) # type: ignore
else:
return is_bn and not isinstance(module, tuple(default_auto_wrap_policy.EXCLUDE_WRAP_MODULES)) # type: ignore

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, I find the next couple of lines (config, single group, then guided auto wrap) very elegant

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question: could dist.new_group(ranks=[my_rank]) impacts performance in any ways ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, should not really, AFAIK the overhead is minimal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there should be any perf impact since FSDP has special casing for world_size == 1. But perhaps @myleott can think of something else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood the question being about the perf cost of having many groups in pytorch distributed basically, vs. few, not specific to FSDP. I might be wrong, but that was the reasoning behind my reply

@@ -54,7 +45,16 @@ def forward(self, x):
# TODO (Min): check DDP equivalency.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having been burnt a little by that, I would recommend not waiting too long for that part

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely. see my plan below.

Copy link
Contributor

@blefaudeux blefaudeux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks very good to me, I guess it's good that the others have a look though, I'm missing some context probably, but seems very clean and reasonable

Copy link
Contributor

@myleott myleott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM 😄

return not isinstance(module, tuple(default_auto_wrap_policy.FORCE_LEAF_MODULES)) # type: ignore
else:
return is_bn and not isinstance(module, tuple(default_auto_wrap_policy.EXCLUDE_WRAP_MODULES)) # type: ignore

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question: could dist.new_group(ranks=[my_rank]) impacts performance in any ways ?

@min-xu-ai
Copy link
Contributor Author

Thank you guys @blefaudeux @myleott @tchaton for quick and high quality reviews. To forecast a bit:

  1. after this is merged, I will make a new release which VISSL can depend on
  2. I will make a vissl PR so that it can use the new flatten FSDP integration I have been preparing and testing for a while. That PR is to enable Lei, Giri and others to do more testings.
  3. After that I am going to add DDP parity check here in the test. I plan to get a bit fancy here with pytest.fixture. There, we can run ddp once and reuse the results per test session for all test cases to check against.

@min-xu-ai min-xu-ai merged commit 8b59267 into master Mar 18, 2021
@min-xu-ai min-xu-ai deleted the min/auto_wrap_bn branch March 18, 2021 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants