Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Comet Logger pickled behavior #2553

Merged
merged 15 commits into from
Sep 18, 2020

Conversation

Lothiraldan
Copy link
Contributor

What does this PR do?

Hello, I'm working for Comet.ml and we got report that the Comet Logger wasn't working correctly in distributed mode. I was happy to see that the main issue has been solved few days ago: https://github.com/PyTorchLightning/pytorch-lightning/pull/2518/files#diff-a79ed8980a01d44db3ea399541407142. I iterated over that fix to improve the behavior with DDP by saving the experiment ID when pickling, so we can use the same experiment id when recreating, and don't create an Experiment object when trying to set the experiment name or accessing the Logger version.

  • Delay the creation of the actual experiment object for as long as we can.
  • Save the experiment id in case an Experiment object is created so we can
    continue the same experiment in the sub-processes.
  • Run pre-commit on the comet file.

* Delay the creation of the actual experiment object for as long as we can.
* Save the experiment id in case an Experiment object is created so we can
  continue the same experiment in the sub-processes.
* Run pre-commit on the comet file.
@mergify mergify bot requested a review from a team July 8, 2020 17:56
@Lothiraldan Lothiraldan marked this pull request as draft July 8, 2020 18:03
tests/loggers/test_comet.py Outdated Show resolved Hide resolved
pytorch_lightning/loggers/comet.py Outdated Show resolved Hide resolved
pytorch_lightning/loggers/comet.py Outdated Show resolved Hide resolved
@mergify mergify bot requested a review from a team July 8, 2020 18:08
@awaelchli
Copy link
Contributor

I also added some tests for loggers including comet over here #2502

Make most Comet Logger attribute protected as they might not reflect the final
Experiment attributes. Also fix the typo in the test name.
@Lothiraldan
Copy link
Contributor Author

I have pushed a fix for the review comments, thank you! I've take a look at #2502, this should fix the test test_loggers_fit_test[CometLogger] failure right?

I've considered removing the name setter but decided to keep it for backward compatibility sake, I think removing it would make it cleaner.

Comment on lines 222 to 224
def name(self) -> Optional[str]:
# don't create an experiment if we don't have one
return self._experiment.project_name if self._experiment else self._project_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests fail because of this here. If the experiment does not exist and the project name is also None, it will return None, and the trainer tries to os.path.join(..., None), that does not work. I would keep it as before. The same applies to the version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous code was problematic because it would create an experiment object if None exists. In addition, if project_name is None and no project_name is configured by the user, self._experiment.project_name can also returns None.
Wandb logger seems to also return None in that case https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pytorch_lightning/loggers/wandb.py#L140, is there a difference between the Wandb logger and the Comet logger that would make returning None for the Logger name problematic?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the wandb logger in the tests is called to create the experiment before accessing the name. You are right there is no difference in the definition there. None is problematic when the Trainer is trying to concat the path to the logger directory, so it is doing os.path.join(root, name, version, ...) and if some of these are None it throws.

Why do you think it is problematic that the call to e.g. name or version creates an experiment if it does not exist? The fact that the constructor does not immediately create the experiment means it needs to be triggered as soon as the user tries to interact with the logger.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can think of at least 2 scenarios when implicit experiment creation would be problematic:

  • In DDP mode, if one process which is not the rank zero one tries to access name or version, it will create an Experiment that will not receives metrics or parameters. This will slow down the process and add noise as the Experiment (almost empty) summary will be displayed at the end of the training.
  • In DDP mode, if one experiment is created before the actual start of the training, we will have to recreate another experiment in the DDP processes anyway so that will slow down the training and generates extra output too. This particular scenario would still happens if the user tries to access logger.experiment before calling Trainer.fit but I think we should avoid this if we can. I've move the handling of experiment_name to delay the creation of the actual Experiment until we really need it.

As I said before, for the logger name, even if the experiment exists, there is still a chance that the project_name is None. If it is absolutely required for the Logger to returns a non-null string, I will have to think about a solution. Maybe returning a default string.

Copy link
Contributor

@awaelchli awaelchli Jul 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on rank > 0 the experiments don't get called, see the decorator "rank_zero_experiment".

As I said before, for the logger name, even if the experiment exists, there is still a chance that the project_name is None. If it is absolutely required for the Logger to returns a non-null string, I will have to think about a solution. Maybe returning a default string.

Do you think it would be too strict to make the project/experiment name mandatory?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think returning a default name would be reasonable too. Some other loggers do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've update the code to returns a default name in case no project_name is set. I've also added a couple of tests, I'm not sure if current build failures are linked to my changes or not, I have the impression they are not, could you confirm it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, try to merge master, then they will be triggered again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated my PR with your review comments and merged master. I don't think the current failures are linked to my changes.

@mergify mergify bot requested a review from a team July 9, 2020 02:20
@williamFalcon williamFalcon changed the title Improve Comet Logger pickled behavior [WIP] Improve Comet Logger pickled behavior Jul 9, 2020
@codecov
Copy link

codecov bot commented Jul 10, 2020

Codecov Report

Merging #2553 into master will decrease coverage by 5%.
The diff coverage is 97%.

@@           Coverage Diff           @@
##           master   #2553    +/-   ##
=======================================
- Coverage      91%     86%    -5%     
=======================================
  Files         109     109            
  Lines        8031    8081    +50     
=======================================
- Hits         7291    6939   -352     
- Misses        740    1142   +402     

pytorch_lightning/loggers/comet.py Outdated Show resolved Hide resolved
pytorch_lightning/loggers/comet.py Show resolved Hide resolved
pytorch_lightning/loggers/comet.py Show resolved Hide resolved
@mergify mergify bot requested a review from a team July 18, 2020 03:14
@Borda Borda added this to the 0.9.0 milestone Aug 6, 2020
@Borda Borda added the feature Is an improvement or enhancement label Aug 7, 2020
@Borda
Copy link
Member

Borda commented Aug 7, 2020

@Lothiraldan how is it going, still wip or ready to review? 🐰

@Lothiraldan Lothiraldan marked this pull request as ready for review August 7, 2020 11:54
@Lothiraldan
Copy link
Contributor Author

@Borda I've merged master and fixed the conflict. There was still one open discussion (#2553 (comment)) but I guess it is ready for review.

@Lothiraldan Lothiraldan changed the title [WIP] Improve Comet Logger pickled behavior Improve Comet Logger pickled behavior Aug 7, 2020
Copy link
Contributor

@awaelchli awaelchli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests are solid! thanks for that!
minor comments, but overall LGTM!!

pytorch_lightning/loggers/comet.py Show resolved Hide resolved
pytorch_lightning/loggers/comet.py Show resolved Hide resolved
with patch('pytorch_lightning.loggers.comet.CometExperiment') as comet:
logger = CometLogger(api_key=api_key, experiment_name=experiment_name,)

# The experiment object should not exists
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you remove these type of comments, also in the other places below? the assertion below makes it very clear

@mergify mergify bot requested a review from a team August 8, 2020 01:17
Copy link
Member

@ethanwharris ethanwharris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, LGTM 😃

@mergify mergify bot requested a review from a team August 9, 2020 06:45
@Lightning-AI Lightning-AI deleted a comment from awaelchli Aug 11, 2020
@Lightning-AI Lightning-AI deleted a comment from ethanwharris Aug 11, 2020
pytorch_lightning/loggers/comet.py Outdated Show resolved Hide resolved
pytorch_lightning/loggers/comet.py Outdated Show resolved Hide resolved
@mergify mergify bot requested a review from a team August 11, 2020 09:40
@Borda
Copy link
Member

Borda commented Aug 11, 2020

@Lothiraldan can you pls allow editing your PR? it seems I cannot accept suggestions...

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
@Lothiraldan
Copy link
Contributor Author

@Borda I've applied the suggestions and I will remove the extraneous comments. Sorry for the answer time, I was in vacation

@edenlightning edenlightning modified the milestones: 0.9.0, 0.9.x Aug 20, 2020
@SkafteNicki SkafteNicki added the ready PRs ready to be merged label Sep 15, 2020
@mergify
Copy link
Contributor

mergify bot commented Sep 15, 2020

This pull request is now in conflict... :(

@pep8speaks
Copy link

pep8speaks commented Sep 15, 2020

Hello @Lothiraldan! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-09-18 08:52:27 UTC

@Lothiraldan
Copy link
Contributor Author

The tests error seems unrelated to my changes

tests/loggers/test_comet.py Outdated Show resolved Hide resolved
tests/loggers/test_comet.py Outdated Show resolved Hide resolved
tests/loggers/test_comet.py Outdated Show resolved Hide resolved
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
@Borda Borda removed the ready PRs ready to be merged label Sep 18, 2020
@Borda Borda merged commit e2af4f1 into Lightning-AI:master Sep 18, 2020
@Vozf
Copy link
Contributor

Vozf commented Oct 19, 2020

Well, this changes are breaking everything if I am already using COMET_EXPERIMENT_KEY environmental variable. I'll make an issue with details and a pull request to fix that

@Borda
Copy link
Member

Borda commented Oct 19, 2020

Well, this changes are breaking everything if I am already using COMET_EXPERIMENT_KEY environmental variable. I'll make an issue with details and a pull request to fix that

can you just make a PR if you want to work on the fix directly and refer the problem there... :]

@Vozf
Copy link
Contributor

Vozf commented Oct 19, 2020

I've created a PR already

@Lothiraldan
Copy link
Contributor Author

@Vozf Indeed this use-case was broken by my PR, sorry about that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants