Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP]: ddp pickle fix for learning rate finder #1834

Closed

Conversation

SkafteNicki
Copy link
Member

@SkafteNicki SkafteNicki commented May 14, 2020

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

What does this PR do?

Fixes #1831
This fixes the pickle error on ddp for the learning rate finder as described in the issue. I can get the learning rate finder to finish its search, however since the internal state is destroyed after ddp training so are the logged results. How can I get around this?

@mergify mergify bot requested a review from a team May 14, 2020 13:08
@Borda Borda added the bug Something isn't working label May 14, 2020
@williamFalcon
Copy link
Contributor

williamFalcon commented May 14, 2020

i think it's time to add a shared memory queue before going into the process so we can bring stuff back out. this would also solve the test results issue.

Want to add that to this PR?

https://pytorch.org/docs/stable/notes/multiprocessing.html#reuse-buffers-passed-through-a-queue.

https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Queue

Probably create it in def .fit right before spawn?

@williamFalcon
Copy link
Contributor

@SkafteNicki shall we merge this now and tackle the Q thing later?
or shall we push this fix to 0.7.7?

@Borda Borda added this to the 0.7.7 milestone May 14, 2020
@SkafteNicki
Copy link
Member Author

Push to 0.7.7 since the PR right now only is the solution to the pickle problem.

@mergify
Copy link
Contributor

mergify bot commented May 24, 2020

This pull request is now in conflict... :(

@Borda Borda modified the milestones: 0.7.7, 0.8.0 May 26, 2020
@SkafteNicki SkafteNicki mentioned this pull request May 29, 2020
5 tasks
@SkafteNicki SkafteNicki deleted the bugfix/lr_finder_ddp branch October 8, 2020 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DDP breaks LR finder
3 participants