Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] experimental: Add spectrain support #372

Merged
merged 3 commits into from
Mar 10, 2021
Merged

Conversation

sidgoyal78
Copy link
Contributor

@sidgoyal78 sidgoyal78 commented Feb 8, 2021

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
  • Did you read the contributor guideline?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

What does this PR do?

Adds support for training models using spectrain based asynchronous pipelining. Reference: https://arxiv.org/pdf/1809.02839.pdf

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 8, 2021
benchmarks/benchmark_dataset.py Outdated Show resolved Hide resolved
@@ -0,0 +1,56 @@
import torch
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msbaines : should I put this file and the experimental_ampnet.py inside an benchmarks/experimental/ folder.

Looks like this file (benchmark_dataset.py) was removed earlier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive by comment: Mentioned in another conversation, we refactored fairscale benchmarks and have common dataset loaders. We should try and use that unless it doesn't work for your use case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a great suggestion. I plan to refactor this, but I will first make a PR with xpipe (which depends on the dataloader from this script). Once we merge that, I will plan to refactor.

@@ -518,6 +613,7 @@ def bench_mpi(args):
parser.add_argument("--max-batch", type=int, default=4, help="Max number of batches")
parser.add_argument("--socket-name", type=str, default=None, help="socket ifname for gloo/tp")
parser.add_argument("--num-decoder-layers", type=int, default=10, help="Number of decoder layers in the model")
parser.add_argument("--spectrain", action="store_true", default=False, help="Use spectrain based weight prediction")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to enable these benchmarks in circleCI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can do that later.

@sidgoyal78
Copy link
Contributor Author

@anj-s and @msbaines : Thanks for reviewing the PR. I addressed most of your comments, and would be great if you could take a final look.

@@ -0,0 +1,58 @@
# Copyright (c) Facebook, Inc. and its affiliates. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing license section?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a discrepancy across many scripts. I noticed that this is the comment that is present in most of the other scripts (see, benchmarks/pipe.py, etc). However, there's an extra license section in benchmarks/experimental/offload.py

Let me make an issue and we can address it separately.

@anj-s
Copy link
Contributor

anj-s commented Mar 9, 2021

@anj-s and @msbaines : Thanks for reviewing the PR. I addressed most of your comments, and would be great if you could take a final look.

thank you @sidgoyal78 for the PR and making changes! Another thing to mention is that the model can also be reused similar to benchmarks/pipe.py when you end up refactoring.

@sidgoyal78
Copy link
Contributor Author

@anj-s : I opened an issue #506 to address your point about header/license. Maybe we can discuss there and I can send out a quick PR.

@sidgoyal78 sidgoyal78 merged commit 5e8a642 into master Mar 10, 2021
@min-xu-ai min-xu-ai deleted the experimental_spectrain branch July 26, 2022 03:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants