Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix GpuUsageLogger to work on different platforms #3008

Merged
merged 38 commits into from
Aug 27, 2020
Merged

Fix GpuUsageLogger to work on different platforms #3008

merged 38 commits into from
Aug 27, 2020

Conversation

rohitgr7
Copy link
Contributor

@rohitgr7 rohitgr7 commented Aug 16, 2020

What does this PR do?

Fixes GpuUsageLogger to work on all platforms.
Also renamed it to GPUStatsMonitor: #3008 (comment)
Todo:

  • make the call to nvidia-smi once

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you to create a separate PR for every change.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?
  • Did you verify new and existing tests pass locally with your changes?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@mergify mergify bot requested a review from a team August 16, 2020 17:42
@codecov
Copy link

codecov bot commented Aug 16, 2020

Codecov Report

Merging #3008 into master will decrease coverage by 9%.
The diff coverage is 85%.

@@           Coverage Diff            @@
##           master   #3008     +/-   ##
========================================
- Coverage      90%     81%     -9%     
========================================
  Files          84      84             
  Lines        8062    9240   +1178     
========================================
+ Hits         7244    7503    +259     
- Misses        818    1737    +919     

@rohitgr7 rohitgr7 changed the title [WIP] Fix gpulogger Fix gpulogger Aug 17, 2020
@rohitgr7 rohitgr7 changed the title Fix gpulogger Fix GpuUsageLogger to work on different platforms Aug 17, 2020
@mergify
Copy link
Contributor

mergify bot commented Aug 17, 2020

This pull request is now in conflict... :(

@Borda Borda added the bug Something isn't working label Aug 17, 2020
Copy link
Contributor

@awaelchli awaelchli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the docs should say clearly that this is a callback and it uses the logger of the trainer.
Despite the name, we cannot do Trainer(logger=GpuUsageLogger()) which a user might think it is a real logger.

pytorch_lightning/callbacks/gpu_usage_logger.py Outdated Show resolved Hide resolved
pytorch_lightning/callbacks/gpu_usage_logger.py Outdated Show resolved Hide resolved
@mergify mergify bot requested a review from a team August 18, 2020 12:13
Copy link
Member

@SkafteNicki SkafteNicki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should also add a test that checks that something is actually logged, when used in the proper way. Something like this should work (tested locally):

from pytorch_lightning.loggers import CSVLogger
from pytorch_lightning.loggers.csv_logs import ExperimentWriter
import os

@pytest.mark.skipif(not torch.cuda.is_available(), reason="test requires GPU machine")
def test_gpulogger(tmpdir):
    gpu_usage = GpuUsageLogger()
    logger = CSVLogger(tmpdir)
    model = EvalModelTemplate()
    trainer = Trainer(
        default_root_dir=tmpdir,
        callbacks=[gpu_usage],
        max_epochs=1,
        logger=logger
    )
    trainer.fit(model)
    path_csv = os.path.join(logger.log_dir, ExperimentWriter.NAME_METRICS_FILE)
    with open(path_csv, 'r') as fp:
        lines = fp.readlines()
    header = lines[0].split()

    fields = ['GPU_utilization.gpu',
              'GPU_memory.used',
              'GPU_memory.free',
              'GPU_utilization.memory'
              ]

    for f in fields:
        assert any([f in h for h in header])

@mergify mergify bot requested a review from a team August 18, 2020 13:09
@rohitgr7 rohitgr7 changed the title Fix GpuUsageLogger to work on different platforms [WIP] Fix GpuUsageLogger to work on different platforms Aug 18, 2020
@rohitgr7
Copy link
Contributor Author

Should it be renamed to GPUUsageLogger or GpuUsageLogger is fine?

@rohitgr7 rohitgr7 changed the title [WIP] Fix GpuUsageLogger to work on different platforms Fix GpuUsageLogger to work on different platforms Aug 18, 2020
@SkafteNicki
Copy link
Member

Should it be renamed to GPUUsageLogger or GpuUsageLogger is fine?

I would keep it as it is, since the double big U makes it less readable in my opinion.

@mergify mergify bot requested a review from a team August 18, 2020 20:34
@rohitgr7
Copy link
Contributor Author

I would keep it as it is, since the double big U makes it less readable in my opinion.

or maybe GPUStatsLogger.

@awaelchli
Copy link
Contributor

I like GPUStatsLogger more than GpuUsageLogger. Another question is whether it should be called a logger or not. I don't have a better suggestion though 👍

@rohitgr7
Copy link
Contributor Author

Yeah, maybe the name should be tweaked so that on should not confuse it with loggers.
I got few suggestions: MonitorGPUStats, GPUStatsMonitor, WatchGPUStats, GPUStatsWatcher

@awaelchli
Copy link
Contributor

+1 for GPUStatsMonitor or just GPUStats

@mergify
Copy link
Contributor

mergify bot commented Aug 19, 2020

This pull request is now in conflict... :(

@SkafteNicki
Copy link
Member

I guess it was named GpuUsageLogger because we also have a LearningRateLogger in callbacks. I also favor GPUStatsMonitor and we should then rename LearningRateLogger to LearningRateMonitor.

@rohitgr7
Copy link
Contributor Author

Yeah, will make changes in few hours and a will do a sep PR for learning rate logger name change.

@rohitgr7 rohitgr7 changed the title [WIP] Fix GpuUsageLogger to work on different platforms Fix GpuUsageLogger to work on different platforms Aug 27, 2020
@rohitgr7
Copy link
Contributor Author

I am for leaving it to another PR :]

Ok then, let's merge this and will continue changes on another PR :)
@Borda @awaelchli

Copy link
Contributor

@awaelchli awaelchli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rohitgr7 need to ask @williamFalcon because he wants to do refactors which blocks all PRs, but this one is not touching any Trainer stuff so I think it should be fine.

@mergify mergify bot requested a review from a team August 27, 2020 17:41
@rohitgr7
Copy link
Contributor Author

yeah, It's just a callback, completely isolated from those things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants