Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setting PGU device #1128

Closed

Conversation

shubhamagarwal92
Copy link
Contributor

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

What does this PR do?

Related to #609. Filter params for tensorboard logging. Discussed here

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@pep8speaks
Copy link

pep8speaks commented Mar 12, 2020

Hello @shubhamagarwal92! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-03-25 17:04:29 UTC

@shubhamagarwal92
Copy link
Contributor Author

@Borda

all my changes related to #1094 also got reflected in this. I guess first we should resolve that PR and then merge this.

Sorry for the mess.

pytorch_lightning/loggers/tensorboard.py Outdated Show resolved Hide resolved
Comment on lines 116 to 117
from six import string_types
from torch.utils.tensorboard.summary import hparams
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this shall be in top of the file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. but I was mostly following the practices followed in the repo. do you also want to move this line then to the top?

from torch.utils.tensorboard.summary import hparams

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah there are a few more things to cleanup lol

pytorch_lightning/trainer/distrib_parts.py Outdated Show resolved Hide resolved
pytorch_lightning/trainer/evaluation_loop.py Outdated Show resolved Hide resolved
@Borda
Copy link
Member

Borda commented Mar 13, 2020

@PyTorchLightning/core-contributors ^^

@S-aiueo32
Copy link
Contributor

S-aiueo32 commented Mar 14, 2020

@shubhamagarwal92 @Borda
As mentioned in #1144, all value enters the IF-statement if params are sanitized #1130, right?

@awaelchli
Copy link
Member

I'm not sure, but I think this PR overlaps with #1130, at least the hparams part. Correct me if I'm wrong.

@Borda Borda added the feature Is an improvement or enhancement label Mar 18, 2020
@Borda
Copy link
Member

Borda commented Mar 18, 2020

As mentioned in #1144, all value enters the IF-statement if params are sanitized #1130, right?

True, the first part is not needed any more...
about root GPU think not sure yet...

@mergify
Copy link
Contributor

mergify bot commented Mar 24, 2020

This pull request is now in conflict... :(

@tullie
Copy link
Contributor

tullie commented Mar 25, 2020

It's frustrating that tb only accepts those types. I have use cases where filtering the lists and None would be very confusing when looking at the parameters.

How would you feel about converting all non supported types to strings? I.e.

if not isinstance(v, (int, float, string_types, bool, torch.Tensor)):
  v = str(v)

tensorboard_params[k] = v

@Borda
Copy link
Member

Borda commented Mar 25, 2020

yeas, it sounds as a reasonable solution

@awaelchli
Copy link
Member

wasn't this already done in another pr? it was called something like sanitize_hparams

@awaelchli
Copy link
Member

awaelchli commented Mar 25, 2020

found it on master @tullie . You should probably just merge?

exp, ssi, sei = hparams(params, {})
tensorboard_params = {}
for k, v in params.items():
if isinstance(v, (int, float, string_types, bool, torch.Tensor)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this is no longer necessary because non-primitive hparams are already converted to string (see master branch)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree

@Borda
Copy link
Member

Borda commented Mar 25, 2020

found it on master @tullie . You should probably just merge?

I think that it was already pointed out in another issue that there are two PRs doing almost the same
the only thing remaining is about the setting device... I will rename it to lower confusion

@Borda Borda changed the title Filter params for tensorboard logging setting PGU device Mar 25, 2020
@Borda
Copy link
Member

Borda commented Mar 25, 2020

@tullie @awaelchli ready now...

@awaelchli
Copy link
Member

now this pr is the same as #1094 by the same author :) one should be closed.

# set cuda device to root gpu
root_device = (torch.device("cuda", root_gpu) if root_gpu >= 0 else torch.device("cpu"))
torch.cuda.set_device(root_device)

return root_gpu
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it looks like this should not go here, because the method is called "determine_root_gpu_device", and the added code also checks if the device is cpu. This code should probably go to the place where we call determine_root_gpu_device.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do you suggest to place this?

torch.cuda.set_device(root_device)
else:
raise RuntimeError(
'Expected `data_parallel_device_ids` as a list, cannot determine root gpu.'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above we have if self.single_gpu, so why should device ids be a list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was already an if clause:if isinstance(self.data_parallel_device_ids, list) , I raised the runtime error as a safenet because root_gpu = 0 was used earlier by default.

@Borda Borda closed this Mar 25, 2020
@shubhamagarwal92
Copy link
Contributor Author

now this pr is the same as #1094 by the same author :) one should be closed.

@awaelchli you are right that a follow-up but conflicting PR #1130 has already been merged now with the master. Could you please let me know what should be done for #1094 ?

PS. @Borda already closed this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants