Can not use tensor cores #221

vaibhav0195 · 2019-03-26T13:08:19Z

Hi ,
I am on an ubuntu machine with a 2080Ti using cuda 10.0,cuddn 7.4, python3.7 ,pytorch1.0.1 and ubuntu 16.04
I converted the model to use the tensorcore using amp module as specified by this example:

https://nvidia.github.io/apex/amp.html

but when i run my python program using the profiler nvprof as specified here
https://devtalk.nvidia.com/default/topic/1047165/how-to-confirm-whether-tensor-core-is-working-or-not-/

i get :

No events/metrics were profiled.

which as stated by modertator should not occur if my tensorcores were being used.
Can anyone help me why this is happening ?
any help is appreciated
Thanks

mcarilli · 2019-03-26T17:17:23Z

What was the command line you used to run your script under nvprof?

vaibhav0195 · 2019-03-26T17:47:32Z

/usr/local/cuda/bin/nvprof --kernels compute_gemm --metrics tensor_precision_fu_utilization,tensor_int_fu_utilization python myscript.py

hellojialee · 2019-03-29T03:33:35Z

Hi, @vaibhav0195, @mcarilli, must we change all the length (N, C, H, W) of a tensor so that they can be divided by 8 before we can make use of tensor cores?

vaibhav0195 · 2019-03-29T05:48:40Z

@mcarilli i think just the input and output channels of the conv and the batch sizes should do the trick.

mcarilli · 2019-03-29T17:32:29Z

Convolutions:
For cudnn versions 7.2 and ealier, @vaibhav0195 is correct: input channels, output channels, and batch size should be multiples of 8 to use tensor cores. However, this requirement is lifted for cudnn versions 7.3 and later. For cudnn 7.3 and later, you don't need to worry about making your channels/batch size multiples of 8 to enable Tensor Core use.

GEMMs (fully connected layers):
For matrix A x matrix B, where A has size [I, J] and B has size [J, K], I, J, and K must be multiples of 8 to use Tensor Cores. This requirement exists for all cublas and cudnn versions. This means that for bare fully connected layers, the batch size, input features, and output features must be multiples of 8, and for RNNs, you usually (but not always, it can be architecture-dependent depending on what you use for encoder/decoder) need to have batch size, hidden size, embedding size, and dictionary size as multiples of 8.

hellojialee · 2019-03-30T02:06:34Z

@mcarilli Thank you for your clear explanation.

mcarilli · 2019-03-30T03:25:21Z

It may also help to set
torch.backends.cudnn.benchmark=True
at the top of your script, which enables pytorch‘s autotuner. Each time pytorch encounters a new set of convolution parameters, it will test all available cudnn algorithms to find the fastest one, then cache that choice to reuse whenever it encounters the same set of convolution parameters again. The first iteration of your network will be slower as pytorch tests all the cudnn algorithms for each convolution, but the second iteration and later iterations will likely be faster.

zhenhuahu · 2020-07-24T20:57:27Z

ers, it will test all available cudnn algorithms to find the fastest one, then cache that choice to reuse whenever it encounters the same set of convolution parameters again. The first iteration of your network will be slower as pyt

Hi, thanks for your detailed explanation. Is the command to set autotoner
torch.backends.cudnn.benchmark=True
specific for Apex? Can we use it in more general cases?
Thanks.

mcarilli closed this as completed Apr 4, 2019

mcarilli mentioned this issue Jun 5, 2019

Mixed precision training slow #325

Open

mcarilli pinned this issue Jun 5, 2019

mcarilli mentioned this issue Jun 10, 2019

Can't convert trained AMP model to full precision #349

Closed

ptrblck mentioned this issue Jun 23, 2019

RuntimeError and speed loss with opt_level = O1, O2 or O3 #373

Open

ptrblck mentioned this issue Jul 19, 2019

Could torch.einsum gain speed boost ? #394

Closed

mcarilli mentioned this issue Oct 22, 2019

FP16 matmuls with apex #562

Closed

This was referenced Jul 24, 2020

lightning and apex amp performance not improved Lightning-AI/pytorch-lightning#2699

Closed

Tensor Size Constraint for Tensor Cores #921

Closed

xoiga123 mentioned this issue Jul 8, 2022

amp & channels_last datvuthanh/HybridNets#50

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can not use tensor cores #221

Can not use tensor cores #221

vaibhav0195 commented Mar 26, 2019

mcarilli commented Mar 26, 2019

vaibhav0195 commented Mar 26, 2019

hellojialee commented Mar 29, 2019

vaibhav0195 commented Mar 29, 2019

mcarilli commented Mar 29, 2019

hellojialee commented Mar 30, 2019

mcarilli commented Mar 30, 2019 •

edited

Loading

zhenhuahu commented Jul 24, 2020

Can not use tensor cores #221

Can not use tensor cores #221

Comments

vaibhav0195 commented Mar 26, 2019

mcarilli commented Mar 26, 2019

vaibhav0195 commented Mar 26, 2019

hellojialee commented Mar 29, 2019

vaibhav0195 commented Mar 29, 2019

mcarilli commented Mar 29, 2019

hellojialee commented Mar 30, 2019

mcarilli commented Mar 30, 2019 • edited Loading

zhenhuahu commented Jul 24, 2020

mcarilli commented Mar 30, 2019 •

edited

Loading