Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyTorchProfiler crashes when emit_nvtx=True #6153

Closed
NathanGrimaud opened this issue Feb 23, 2021 · 2 comments · Fixed by #6260
Closed

PyTorchProfiler crashes when emit_nvtx=True #6153

NathanGrimaud opened this issue Feb 23, 2021 · 2 comments · Fixed by #6260
Assignees
Labels
bug Something isn't working callback help wanted Open to be worked on priority: 0 High priority task

Comments

@NathanGrimaud
Copy link

🐛 Bug

When training with PyTorchProfiler(emit_nvtx=True), the training stops with the following error :
AttributeError: 'emit_nvtx' object has no attribute 'function_events'

Please reproduce using the BoringModel

-> https://colab.research.google.com/drive/1cqMxMgDVgltluaYZAkn9d2srSDmOu-1p?usp=sharing

To Reproduce

Use following BoringModel and post here

Expected behavior

Environment

Additional context

Additional docs issue :
It says here (https://pytorch-lightning.readthedocs.io/en/1.2.0/advanced/profiler.html) that you have to use nvprof to collect the profiler traces, but it is no longer supported for devices with compute capability 8.0 and higher.
I think that an exemple using nsight-compute would be great ( I can't test it right now because of the nvtx issue )

@NathanGrimaud NathanGrimaud added bug Something isn't working help wanted Open to be worked on labels Feb 23, 2021
@carmocca carmocca added callback priority: 0 High priority task labels Feb 23, 2021
@tchaton
Copy link
Contributor

tchaton commented Mar 1, 2021

Dear @NathanGrimaud,

I opened a PR with a fix. Would you mind giving it a try ?

Best regards,
T.C

@tchaton tchaton closed this as completed Mar 1, 2021
@tchaton tchaton reopened this Mar 1, 2021
@NathanGrimaud
Copy link
Author

Well it works now thanks !
But I cannot get the whole report working

nvprof --profile-from-start off -o trace_name.prof -- pytest tests/trainer/test_trainer.py::test_pytorch_profiler_nested_emit_nvtx

python -c "import torch;print(torch.autograd.profiler.load_nvprof('/home/ubuntu/pytorch-lightning/trace_name.prof'))"

nvprof isn't supported on nvidia's gpus since the Turing architecture (https://docs.nvidia.com/cuda/profiler-users-guide/index.html#profiling-overview).
But this might belong to a new issue ( or even a PR if I can get it to work ^^)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working callback help wanted Open to be worked on priority: 0 High priority task
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants