Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU Usage Logger #2932

Merged
merged 6 commits into from
Aug 12, 2020
Merged

GPU Usage Logger #2932

merged 6 commits into from
Aug 12, 2020

Conversation

groadabike
Copy link
Contributor

@groadabike groadabike commented Aug 12, 2020

What does this PR do?

This is a Callback that logs the GPU utilisation during the training stage, giving a sense of how the resources have been used, helping to measure the effect of any implementation improvement.

Callback that query NVIDIA-SMI some GPU stats and log it:

  • "fan.speed"
  • "memory.used"
  • "memory.free"
  • "utilization.memory"
  • "utilization.cpu
  • "temperature.gpu"
  • "temperature.memory"

Also, It measures the time between batches (inter_step_time) and in batches (intra_step_time)

These last two parameters can give a sense of how the "total batch time" is distributed (loading batch + batch pass).
For example, a large inter_step_time could mean a slow dataload and any improvement in the DataSet class it will be reflected in a reduction of this parameter.

Fixes # (2074)

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you to create a separate PR for every change.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?
  • Did you verify new and existing tests pass locally with your changes?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 馃檭

@pep8speaks
Copy link

pep8speaks commented Aug 12, 2020

Hello @groadabike! Thanks for updating this PR.

Line 92:49: E127 continuation line over-indented for visual indent
Line 115:49: E127 continuation line over-indented for visual indent
Line 141:108: W292 no newline at end of file

Comment last updated at 2020-08-12 14:17:24 UTC

@mergify mergify bot requested a review from a team August 12, 2020 13:16
@williamFalcon
Copy link
Contributor

how is this different than the flag to log gpu usage?

@groadabike
Copy link
Contributor Author

how is this different than the flag to log gpu usage?

the log_gpu_memory flag in trainer just log the memory.used or the min and max memory used.
It doesn't log the rest of the parameters included in the PR.
for example, the % of gpu utilisation or memory free.

@codecov
Copy link

codecov bot commented Aug 12, 2020

Codecov Report

Merging #2932 into master will increase coverage by 3%.
The diff coverage is 40%.

@@           Coverage Diff           @@
##           master   #2932    +/-   ##
=======================================
+ Coverage      86%     90%    +3%     
=======================================
  Files          80      81     +1     
  Lines        7449    7542    +93     
=======================================
+ Hits         6430    6755   +325     
+ Misses       1019     787   -232     

@williamFalcon williamFalcon merged commit f6a3d8f into Lightning-AI:master Aug 12, 2020
@williamFalcon
Copy link
Contributor

ok cool. makes sense!

let's clean up the docs for this

@Borda Borda added the feature Is an improvement or enhancement label Aug 13, 2020
ameliatqy pushed a commit to ameliatqy/pytorch-lightning that referenced this pull request Aug 17, 2020
* GPU utilisation Callback

* GPU utilisation Callback

* Fixing style

* Fixing style

* Fixing CodeFactor: partial executable path

* Fix a misspelling in the Class name
@Borda Borda added this to the 0.9.0 milestone Aug 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants