Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Join Horovod workers at the end of trainer.fit() to prevent race conditions following training #1786

Merged
merged 3 commits into from
May 12, 2020

Conversation

tgaddair
Copy link
Contributor

@mergify mergify bot requested a review from a team May 11, 2020 23:25
@tgaddair tgaddair mentioned this pull request May 11, 2020
5 tasks
@Borda Borda added the bug Something isn't working label May 12, 2020
@mergify
Copy link
Contributor

mergify bot commented May 12, 2020

This pull request is now in conflict... :(

1 similar comment
@mergify
Copy link
Contributor

mergify bot commented May 12, 2020

This pull request is now in conflict... :(

@Borda Borda added the ready PRs ready to be merged label May 12, 2020
@Borda Borda added this to the 0.7.6 milestone May 12, 2020
@mergify mergify bot requested a review from a team May 12, 2020 08:30
@Borda Borda mentioned this pull request May 12, 2020
5 tasks
@Borda Borda added the priority: 0 High priority task label May 12, 2020
@codecov
Copy link

codecov bot commented May 12, 2020

Codecov Report

Merging #1786 into master will increase coverage by 0%.
The diff coverage is 100%.

@@          Coverage Diff           @@
##           master   #1786   +/-   ##
======================================
  Coverage      88%     88%           
======================================
  Files          69      69           
  Lines        4315    4316    +1     
======================================
+ Hits         3804    3805    +1     
  Misses        511     511           

@mergify mergify bot requested a review from a team May 12, 2020 09:14
Copy link
Member

@awaelchli awaelchli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was requested for review but can only accept with low confidence.
I don't understand why the order of warning/import would cause a race condition.

Generally I would add a test for every bug fix to ensure it does not get re-introduced in the future, but not sure what to suggest here.

otherwise looks ok to me :)

@mergify mergify bot merged commit acab068 into Lightning-AI:master May 12, 2020
@mergify
Copy link
Contributor

mergify bot commented May 12, 2020

Great job! =)

@awaelchli
Copy link
Member

oh, I did not know it auto merges. Shouldn't there be a confirm button?

@Borda
Copy link
Member

Borda commented May 12, 2020

so be careful when you approve anything lol, there is a rule to merge all if tests pass and collect three approvals...

@williamFalcon
Copy link
Contributor

lol. i moved those importa up bc they were failing flake8. but now they are at the bottom and flake passes?? @Borda

@Borda
Copy link
Member

Borda commented May 12, 2020

lol. i moved those importa up bc they were failing flake8. but now they are at the bottom and flake passes?? @Borda

yes, there was added ignore, the case was with new flake8 it started to check all warnings/error... before flake8 raised the most important one...

@tgaddair tgaddair deleted the hvd-join branch May 12, 2020 12:31
@tgaddair
Copy link
Contributor Author

I was requested for review but can only accept with low confidence.
I don't understand why the order of warning/import would cause a race condition.

Generally I would add a test for every bug fix to ensure it does not get re-introduced in the future, but not sure what to suggest here.

otherwise looks ok to me :)

Just to clarify, the race condition fix line is here: https://github.com/PyTorchLightning/pytorch-lightning/pull/1786/files#diff-d673a98c14aa5cd59610bcc447178852R631. The other lines were introduced to make tests pass.

@williamFalcon
Copy link
Contributor

awesome haha. I was commenting on our over-zealous flake8 haha :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority: 0 High priority task ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants