Skip to content

Commit

Permalink
[feat] Add PyTorch Profiler. (#5560)
Browse files Browse the repository at this point in the history
* add profiler

* add profiler

* update

* resolve flake8

* update doc

* update changelog

* clean doc

* delete prof file

* merge pr codebase

* update

* update doc

* update doc

* update doc

* update on comments

* update docstring

* update docstring

* try

* update test

* Update pytorch_lightning/profiler/__init__.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/profiler/__init__.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* update on comments

* remove old code

* add support for ddp

* resolve flake8

* Update pytorch_lightning/profiler/__init__.py

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>

* resolve tests

* resolve flake8

Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
  • Loading branch information
4 people authored Jan 26, 2021
1 parent f782230 commit 5f33728
Show file tree
Hide file tree
Showing 13 changed files with 500 additions and 13 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -141,3 +141,4 @@ pytorch\ lightning
test-reports/
wandb
.forked/
*.prof
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,8 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
- `Recall` and `Precision` metrics (and their functional counterparts `recall` and `precision`) can now be generalized to Recall@K and Precision@K with the use of `top_k` parameter ([#4842](https://github.com/PyTorchLightning/pytorch-lightning/pull/4842))


- Added `PyTorchProfiler` ([#5560](https://github.com/PyTorchLightning/pytorch-lightning/pull/5560))


### Changed

Expand Down
5 changes: 3 additions & 2 deletions pytorch_lightning/core/memory.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
import shutil
import subprocess
from collections import OrderedDict
from typing import Tuple, Dict, Union, List, Any
from typing import Any, Dict, List, Tuple, Union

import numpy as np
import torch
Expand Down Expand Up @@ -182,7 +182,8 @@ def __init__(self, model, mode: str = MODE_DEFAULT):
self._model = model
self._mode = mode
self._layer_summary = self.summarize()
self._precision_megabytes = (self._model.precision / 8.0) * 1e-6 # 1 byte -> 8 bits
# 1 byte -> 8 bits
self._precision_megabytes = (self._model.precision / 8.0) * 1e-6

@property
def named_modules(self) -> List[Tuple[str, nn.Module]]:
Expand Down
89 changes: 87 additions & 2 deletions pytorch_lightning/profiler/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@
Advanced Profiling
--------------------
------------------
If you want more information on the functions called during each event, you can use the `AdvancedProfiler`.
This option uses Python's cProfiler_ to provide a report of time spent on *each* function called within your code.
Expand Down Expand Up @@ -114,13 +114,98 @@ def custom_processing_step(self, data):
model = MyModel(profiler)
trainer = Trainer(profiler=profiler, max_epochs=1)
PyTorch Profiling
-----------------
Autograd includes a profiler that lets you inspect the cost of different operators
inside your model - both on the CPU and GPU.
Find the Pytorch Profiler doc at [PyTorch Profiler](https://pytorch-lightning.readthedocs.io/en/stable/profiler.html)
.. code-block:: python
trainer = Trainer(..., profiler="pytorch")
or
profiler = PyTorchProfiler(...)
trainer = Trainer(..., profiler=profiler)
This profiler works with PyTorch ``DistributedDataParallel``.
If ``output_filename`` is provided, each rank will save their profiled operation to their own file.
The profiler's results will be printed on the completion of a training `fit()`. This profiler
report can be quite long, so you can also specify an `output_filename` to save the report instead
of logging it to the output in your terminal.
This profiler will record only for `training_step_and_backward`, `evaluation_step` and `test_step` functions by default.
The output below shows the profiling for the action `training_step_and_backward`.
The user can provide ``PyTorchProfiler(profiled_functions=[...])`` to extend the scope of profiled functions.
.. note:: When using the PyTorch Profiler, wall clock time will not not be representative of the true wall clock time. This is due to forcing profiled operations to be measured synchronously, when many CUDA ops happen asynchronously. It is recommended to use this Profiler to find bottlenecks/breakdowns, however for end to end wall clock time use the `SimpleProfiler`. # noqa E501
.. code-block:: python
Profiler Report
Profile stats for: training_step_and_backward
--------------------- --------------- --------------- --------------- --------------- ---------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg
--------------------- --------------- --------------- --------------- --------------- ---------------
t 62.10% 1.044ms 62.77% 1.055ms 1.055ms
addmm 32.32% 543.135us 32.69% 549.362us 549.362us
mse_loss 1.35% 22.657us 3.58% 60.105us 60.105us
mean 0.22% 3.694us 2.05% 34.523us 34.523us
div_ 0.64% 10.756us 1.90% 32.001us 16.000us
ones_like 0.21% 3.461us 0.81% 13.669us 13.669us
sum_out 0.45% 7.638us 0.74% 12.432us 12.432us
transpose 0.23% 3.786us 0.68% 11.393us 11.393us
as_strided 0.60% 10.060us 0.60% 10.060us 3.353us
to 0.18% 3.059us 0.44% 7.464us 7.464us
empty_like 0.14% 2.387us 0.41% 6.859us 6.859us
empty_strided 0.38% 6.351us 0.38% 6.351us 3.175us
fill_ 0.28% 4.782us 0.33% 5.566us 2.783us
expand 0.20% 3.336us 0.28% 4.743us 4.743us
empty 0.27% 4.456us 0.27% 4.456us 2.228us
copy_ 0.15% 2.526us 0.15% 2.526us 2.526us
broadcast_tensors 0.15% 2.492us 0.15% 2.492us 2.492us
size 0.06% 0.967us 0.06% 0.967us 0.484us
is_complex 0.06% 0.961us 0.06% 0.961us 0.481us
stride 0.03% 0.517us 0.03% 0.517us 0.517us
--------------------- --------------- --------------- --------------- --------------- ---------------
Self CPU time total: 1.681ms
When running with `PyTorchProfiler(emit_nvtx=True)`. You should run as following::
nvprof --profile-from-start off -o trace_name.prof -- <regular command here>
To visualize the profiled operation, you can either:
* Use::
nvvp trace_name.prof
* Use::
python -c 'import torch; print(torch.autograd.profiler.load_nvprof("trace_name.prof"))'
"""

from pytorch_lightning.profiler.profilers import AdvancedProfiler, BaseProfiler, PassThroughProfiler, SimpleProfiler
from pytorch_lightning.profiler.profilers import (
AdvancedProfiler,
BaseProfiler,
PassThroughProfiler,
PyTorchProfiler,
SimpleProfiler,
)

__all__ = [
'BaseProfiler',
'SimpleProfiler',
'AdvancedProfiler',
'PassThroughProfiler',
"PyTorchProfiler",
]
Loading

0 comments on commit 5f33728

Please sign in to comment.