Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to provide seed to random generators to ensure reproducibility #1572

Merged
merged 34 commits into from
May 12, 2020

Conversation

kumuji
Copy link
Contributor

@kumuji kumuji commented Apr 23, 2020

I added small function in utilities which imports torch, numpy, python
random and sets seed for all of the libraries to ensure reproducibility
of results.

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

What does this PR do?

Partially fixes #1565 (issue).

This code makes sure to fix seed for all pseudo-random generators, to ensure the same results for different runs. But probably doesn't work fully with ddp .

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@mergify mergify bot requested a review from a team April 23, 2020 14:29
@codecov
Copy link

codecov bot commented Apr 23, 2020

Codecov Report

Merging #1572 into master will not change coverage.
The diff coverage is n/a.

@@          Coverage Diff           @@
##           master   #1572   +/-   ##
======================================
  Coverage      88%     88%           
======================================
  Files          70      70           
  Lines        4346    4346           
======================================
  Hits         3828    3828           
  Misses        518     518           

Copy link
Contributor

@jeremyjordan jeremyjordan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a great feature! might i suggest a reorg moving this to pytorch_lightning/seed.py? i think we are eventually trying to get rid of the utilities module in favor of more direct names.

pytorch_lightning/utilities/reproducibility.py Outdated Show resolved Hide resolved
@mergify mergify bot requested a review from a team April 24, 2020 01:45
@awaelchli
Copy link
Contributor

awaelchli commented Apr 24, 2020

Nice! Would it maybe be good to document for which libraries the seed is set (numpy, torch, random, ...)? This could maybe go to trainer/__init__.py where the Trainer args are explained in more detail. Also to make it clear it's not set by default.

pytorch_lightning/utilities/reproducibility.py Outdated Show resolved Hide resolved
pytorch_lightning/utilities/reproducibility.py Outdated Show resolved Hide resolved
pytorch_lightning/utilities/reproducibility.py Outdated Show resolved Hide resolved
@mergify mergify bot requested a review from a team April 24, 2020 07:52
@Borda
Copy link
Member

Borda commented Apr 24, 2020

@kumuji mind check all examples (probably also elsewhere) and make then use its seeding

@Borda Borda added the feature Is an improvement or enhancement label Apr 24, 2020
@Borda Borda modified the milestones: 0.7.4, 0.7.5 Apr 24, 2020
@Borda
Copy link
Member

Borda commented Apr 24, 2020

i think we are eventually trying to get rid of the utilities module in favor of more direct names.

@jamesjjcondon do you have any suggestions? 🐰 me personally not much fun of having packages with just one module (and fund inside)

@kumuji kumuji requested review from jeremyjordan and justusschock and removed request for a team April 24, 2020 10:20
@mergify mergify bot requested review from a team April 24, 2020 10:20
@kumuji
Copy link
Contributor Author

kumuji commented Apr 24, 2020

Also, I am sorry, I rebased manually on master, but seems like it was not a good idea since there is autorebase in github actions. Should I fix it?

@williamFalcon
Copy link
Contributor

For seeding to work well with ddp the order needs to be:

seed_everything()

# seed should be the first thing done because in ddp, model init needs to \\
# be the same on every process
model()
fit()

@kumuji
Copy link
Contributor Author

kumuji commented Apr 24, 2020

For seeding to work well with ddp the order needs to be:

seed_everything()

# seed should be the first thing done because in ddp, model init needs to \\
# be the same on every process
model()
fit()

In the last commit fixed it in the benchmark code - model was initialized before trainer, where seeding happens

@Borda
Copy link
Member

Borda commented Apr 24, 2020

Also, I am sorry, I rebased manually on master, but seems like it was not a good idea since there is autorebase in github actions. Should I fix it?

it is completely fine, in fact the rebase from action is nothing now.. and also cannot resolve conflicts...

@mergify mergify bot requested a review from a team April 28, 2020 10:35
@awaelchli
Copy link
Contributor

@kumuji We can't really see your changes here on Github, could you rebase onto master, please?

@Borda Borda force-pushed the add_seed_for_reproducibility branch from 48a2234 to 8ef0e05 Compare April 29, 2020 07:24
@Borda Borda requested a review from justusschock April 29, 2020 07:24
@Borda Borda force-pushed the add_seed_for_reproducibility branch from 36b0fc0 to 2c97d5b Compare May 12, 2020 06:21
@Borda
Copy link
Member

Borda commented May 12, 2020

@williamFalcon done ^^

@williamFalcon williamFalcon merged commit 619f984 into Lightning-AI:master May 12, 2020
@williamFalcon
Copy link
Contributor

@kumuji amazing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

How to properly fix random seed with pytorch lightning?
8 participants