Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow loading checkpoints from urls #1667

Merged
merged 18 commits into from
Jun 11, 2020
Merged

Allow loading checkpoints from urls #1667

merged 18 commits into from
Jun 11, 2020

Conversation

yukw777
Copy link
Contributor

@yukw777 yukw777 commented Apr 29, 2020

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

What does this PR do?

Fixes #1532. I didn't touch hpc loading.

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@mergify mergify bot requested a review from a team April 29, 2020 18:23
@codecov
Copy link

codecov bot commented Apr 29, 2020

Codecov Report

Merging #1667 into master will increase coverage by 0%.
The diff coverage is 92%.

@@          Coverage Diff           @@
##           master   #1667   +/-   ##
======================================
  Coverage      86%     86%           
======================================
  Files          75      76    +1     
  Lines        4788    4797    +9     
======================================
+ Hits         4134    4143    +9     
  Misses        654     654           

@williamFalcon williamFalcon changed the title allow loading checkpoints from urls [WIP] allow loading checkpoints from urls Apr 30, 2020
@Borda Borda added the feature Is an improvement or enhancement label May 26, 2020
@Borda Borda added this to the 0.8.0 milestone May 26, 2020
@Borda
Copy link
Member

Borda commented May 26, 2020

@yukw777 how is it going, it this ready for a review? 🦝

@yukw777
Copy link
Contributor Author

yukw777 commented May 27, 2020

@Borda I wasn’t sure how to write proper tests for this. Can you advise?

@Borda
Copy link
Member

Borda commented May 27, 2020

@Borda I wasn’t sure how to write proper tests for this. Can you advise?

can we somehow load from localhost? otherwise, we shall upload a model to S3 or other storage... but then this test won't work without an internet connection...

@yukw777
Copy link
Contributor Author

yukw777 commented May 27, 2020

@Borda let me see if there’s a way to run a simple file serving server locally for pytest...

@jeremyjordan
Copy link
Contributor

if there’s not, you can always mock out the call to the internet and then have a separate test which marks the test as an integration test (pytest.mark.integration) which should only be run on the CI server

@yukw777 yukw777 marked this pull request as ready for review May 30, 2020 21:39
@yukw777 yukw777 changed the title [WIP] allow loading checkpoints from urls Allow loading checkpoints from urls May 30, 2020
@yukw777
Copy link
Contributor Author

yukw777 commented May 30, 2020

Alright added some tests! I ended up creating a new pytest fixture that runs a simple server that serves tmpdir.

@mergify
Copy link
Contributor

mergify bot commented May 31, 2020

This pull request is now in conflict... :(

@pep8speaks
Copy link

pep8speaks commented May 31, 2020

Hello @yukw777! Thanks for updating this PR.

Line 56:120: E501 line too long (134 > 119 characters)
Line 93:120: E501 line too long (134 > 119 characters)

Comment last updated at 2020-06-11 19:52:35 UTC

@yukw777
Copy link
Contributor Author

yukw777 commented May 31, 2020

CircleCI jobs timed out. They should pass if rerun.

@awaelchli
Copy link
Member

if you rebase, the failing test should pass

@mergify
Copy link
Contributor

mergify bot commented Jun 8, 2020

This pull request is now in conflict... :(

CHANGELOG.md Outdated Show resolved Hide resolved
from urllib.parse import urlparse


def load(path_or_url: str, map_location=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is type of map_location

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can't really specify this b/c it can be a lot of different things: #1505

Welcome any suggestions.

tests/trainer/test_trainer.py Outdated Show resolved Hide resolved
@mergify mergify bot requested a review from a team June 8, 2020 13:46
@yukw777 yukw777 requested review from Borda and removed request for a team June 8, 2020 14:14
@mergify mergify bot requested a review from a team June 8, 2020 14:15
@Borda Borda modified the milestones: 0.8.0, 0.9.0 Jun 9, 2020
@mergify
Copy link
Contributor

mergify bot commented Jun 9, 2020

This pull request is now in conflict... :(

Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🦝

@mergify mergify bot requested a review from a team June 9, 2020 22:11
CHANGELOG.md Outdated Show resolved Hide resolved
Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com>
@williamFalcon
Copy link
Contributor

this is super cool!
can we document this? add it to the checkpoint section and update the docstrings on the function

@Laksh1997
Copy link

Very keen for this feature!

@mergify mergify bot requested a review from a team June 11, 2020 19:51
@mergify mergify bot requested a review from a team June 11, 2020 19:52
@mergify mergify bot requested a review from a team June 11, 2020 19:53
@Borda Borda requested a review from williamFalcon June 11, 2020 20:01
@williamFalcon williamFalcon merged commit 06cd849 into Lightning-AI:master Jun 11, 2020
@yukw777 yukw777 deleted the feature/1532_load-ckpt-url branch June 11, 2020 21:13
@Laksh1997
Copy link

Awesome stuff, do we know when the next tagged release will be? (which will include this feature)

Also, I assume this feature supports saving to S3 as well?

Awesome stuff!!

@Borda
Copy link
Member

Borda commented Jun 11, 2020

Awesome stuff, do we know when the next tagged release will be? (which will include this feature)

most likely by end of this week...
@williamFalcon adding to v0.8.x

Borda pushed a commit that referenced this pull request Jun 11, 2020
* allow loading checkpoints from urls

* tmpdir_server fixture

* test cases for loading checkpoints from url

* dir => root_dir

* default map_location to None

* test case for resume_from_checkpoint

* changelog

* doc update

* monkeypatch TORCH_HOME to avoid caching

* Use a threading server with random ports so that it is easier to clean up

* test fixes

* pep8 fix

* ThreadingHTTPServer support in 3.6

* pep8 fix

* fix changelog

* separate tests for urls

* typo

Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
(cherry picked from commit 06cd849)
@Borda Borda modified the milestones: 0.9.0, 0.8.0 Jun 18, 2020
@awaelchli awaelchli mentioned this pull request Jun 26, 2020
7 tasks
justusschock pushed a commit that referenced this pull request Jun 29, 2020
* allow loading checkpoints from urls

* tmpdir_server fixture

* test cases for loading checkpoints from url

* dir => root_dir

* default map_location to None

* test case for resume_from_checkpoint

* changelog

* doc update

* monkeypatch TORCH_HOME to avoid caching

* Use a threading server with random ports so that it is easier to clean up

* test fixes

* pep8 fix

* ThreadingHTTPServer support in 3.6

* pep8 fix

* fix changelog

* separate tests for urls

* typo

Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
@danielgafni
Copy link

Hello, guys! What's the state of this feature? I'm interested in using s3 with ModelCheckpoint directly. Could I help with the development maybe?

@Borda
Copy link
Member

Borda commented Nov 29, 2020

Hello, guys! What's the state of this feature? I'm interested in using s3 with ModelCheckpoint directly. Could I help with the development maybe?

Hi, it is in there already, mind try latest version?

@danielgafni
Copy link

danielgafni commented Dec 5, 2020

@Borda Hi! Thanks, saving checkpoints to s3 is indeed working! However, logging with TensorboardLogger isn't. Should I make a separate issue for that?

P.S. I suspect it might be a tensorboard issue. It doesn't work with s3 without installing boto3, while TensorBoardLogger is not working without installing a conflicting version of s3fs.

boto3 1.16.30 requires botocore<1.20.0,>=1.19.30, but you'll have botocore 1.17.44 which is incompatible.

@danielgafni
Copy link

I just tried the new version and the issue I talked about is gone! Very cool!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

load checkpoint from URL
8 participants