-
Notifications
You must be signed in to change notification settings - Fork 276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat] experimental: Add spectrain support #372
Conversation
benchmarks/benchmark_dataset.py
Outdated
@@ -0,0 +1,56 @@ | |||
import torch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msbaines : should I put this file and the experimental_ampnet.py inside an benchmarks/experimental/ folder.
Looks like this file (benchmark_dataset.py) was removed earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drive by comment: Mentioned in another conversation, we refactored fairscale benchmarks and have common dataset loaders. We should try and use that unless it doesn't work for your use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's a great suggestion. I plan to refactor this, but I will first make a PR with xpipe (which depends on the dataloader from this script). Once we merge that, I will plan to refactor.
benchmarks/experimental_ampnet.py
Outdated
@@ -518,6 +613,7 @@ def bench_mpi(args): | |||
parser.add_argument("--max-batch", type=int, default=4, help="Max number of batches") | |||
parser.add_argument("--socket-name", type=str, default=None, help="socket ifname for gloo/tp") | |||
parser.add_argument("--num-decoder-layers", type=int, default=10, help="Number of decoder layers in the model") | |||
parser.add_argument("--spectrain", action="store_true", default=False, help="Use spectrain based weight prediction") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to enable these benchmarks in circleCI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can do that later.
c83de91
to
b647a9d
Compare
@@ -0,0 +1,58 @@ | |||
# Copyright (c) Facebook, Inc. and its affiliates. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing license section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a discrepancy across many scripts. I noticed that this is the comment that is present in most of the other scripts (see, benchmarks/pipe.py, etc). However, there's an extra license section in benchmarks/experimental/offload.py
Let me make an issue and we can address it separately.
thank you @sidgoyal78 for the PR and making changes! Another thing to mention is that the model can also be reused similar to benchmarks/pipe.py when you end up refactoring. |
Before submitting
What does this PR do?
Adds support for training models using spectrain based asynchronous pipelining. Reference: https://arxiv.org/pdf/1809.02839.pdf
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃