Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checkpointing interval #1272

Merged
merged 11 commits into from
Mar 30, 2020
Merged

Checkpointing interval #1272

merged 11 commits into from
Mar 30, 2020

Conversation

Borda
Copy link
Member

@Borda Borda commented Mar 28, 2020

What does this PR do?

Fixes #1264. In particular:

  • explicitly count epochs
  • move checkpointing in case missing valid to training end

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@Borda Borda added the bug Something isn't working label Mar 28, 2020
@Borda Borda added this to the 0.7.2 milestone Mar 28, 2020
@Borda Borda requested a review from a team March 28, 2020 16:01
@pep8speaks
Copy link

pep8speaks commented Mar 28, 2020

Hello @Borda! Thanks for updating this PR.

Line 203:111: E501 line too long (115 > 110 characters)

Comment last updated at 2020-03-30 22:37:01 UTC

@Borda Borda marked this pull request as ready for review March 28, 2020 19:47
@codecov
Copy link

codecov bot commented Mar 28, 2020

Codecov Report

❗ No coverage uploaded for pull request base (master@3476d2f). Click here to learn what that means.
The diff coverage is 85%.

@@           Coverage Diff            @@
##             master   #1272   +/-   ##
========================================
  Coverage          ?     92%           
========================================
  Files             ?      61           
  Lines             ?    3147           
  Branches          ?       0           
========================================
  Hits              ?    2886           
  Misses            ?     261           
  Partials          ?       0

Copy link
Contributor

@awaelchli awaelchli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:)

pytorch_lightning/trainer/distrib_data_parallel.py Outdated Show resolved Hide resolved
pytorch_lightning/trainer/distrib_data_parallel.py Outdated Show resolved Hide resolved
pytorch_lightning/trainer/distrib_data_parallel.py Outdated Show resolved Hide resolved
pytorch_lightning/trainer/training_loop.py Show resolved Hide resolved
Borda and others added 4 commits March 28, 2020 23:21
Co-Authored-By: Adrian Wälchli <adrian.waelchli@students.unibe.ch>
@Borda Borda requested review from MattPainter01 and a team March 30, 2020 16:14
@Borda Borda added the ready PRs ready to be merged label Mar 30, 2020
@mergify
Copy link
Contributor

mergify bot commented Mar 30, 2020

This pull request is now in conflict... :(

@mergify mergify bot requested a review from a team March 30, 2020 22:33
@williamFalcon williamFalcon merged commit 09167ef into master Mar 30, 2020
@Borda Borda deleted the checkpoint branch March 30, 2020 22:56
alexeykarnachev pushed a commit to alexeykarnachev/pytorch-lightning that referenced this pull request Apr 3, 2020
* formatting

* formatting

* fix interval

* fix train loop

* fix test

* parametrize test

* Apply suggestions from code review

Co-Authored-By: Adrian Wälchli <adrian.waelchli@students.unibe.ch>

* fix calling

* flake8

* add types

Co-authored-by: Adrian Wälchli <adrian.waelchli@students.unibe.ch>
Co-authored-by: William Falcon <waf2107@columbia.edu>
@Borda Borda modified the milestones: v0.7., v0.7.x Apr 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multiple undesired checkpoints created during single epoch
6 participants