Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[chore] Fix main breakage temporarily by relaxing constraints #828

Merged
merged 2 commits into from
Oct 24, 2021

Conversation

anj-s
Copy link
Contributor

@anj-s anj-s commented Oct 23, 2021

What does this PR do?

Fix main breakage temporarily by relaxing constraints of the expected throughput.

Before submitting

  • Did you have fun?
    • Make sure you had fun coding 🙃
  • Did you read the contributor guideline?
  • Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
    • N/A
  • Did you make sure to update the docs?
    • N/A
  • Did you write any new necessary tests?
    • N/A
  • Did you update the changelog? (if needed)
    • N/A

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 23, 2021
@anj-s anj-s requested a review from blefaudeux October 23, 2021 15:49
Copy link
Contributor

@blefaudeux blefaudeux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the tree that keeps giving.. could be interesting to see whether some NCCL env variable can bring some speed back from broadcast, could be why this tanked ?

@anj-s
Copy link
Contributor Author

anj-s commented Oct 24, 2021

the tree that keeps giving.. could be interesting to see whether some NCCL env variable can bring some speed back from broadcast, could be why this tanked ?

Sure, but does it explain why there was a regression? I was thinking the NCCL_MAX_NRINGS would possibly increase the performance but would not explain why it dropped.

@anj-s anj-s merged commit eadfdc4 into main Oct 24, 2021
@anj-s anj-s deleted the fix-main-breakage branch October 24, 2021 01:27
@blefaudeux
Copy link
Contributor

the tree that keeps giving.. could be interesting to see whether some NCCL env variable can bring some speed back from broadcast, could be why this tanked ?

Sure, but does it explain why there was a regression? I was thinking the NCCL_MAX_NRINGS would possibly increase the performance but would not explain why it dropped.

I'm guessing that NCCL is statically linked (so part of the pytorch-cu release) and could be that it defaults changed when changing pytorch versions ? just a guess, could well be wrong

vtantia pushed a commit that referenced this pull request Oct 29, 2021
* relax speed constraints

* relax the regressions constraints
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[benchmarks][bug] PyTorch version update from 1.8.1 to 1.9.0 causes regression in OSS
3 participants