Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cudnnConvolutionBackwardData failed - Error in CuDNN: CUDNN_STATUS_NOT_SUPPORTED (cudnnConvolutionBackwardData) #384

Open
ProGamerGov opened this issue Oct 25, 2017 · 8 comments

Comments

@ProGamerGov
Copy link

I'm not sure what is causing this error, and how to fix it:

cudnnConvolutionBackwardData failed:    9        convDesc=[mode : CUDNN_CROSS_CORRELATION datatype : CUDNN_DATA_FLOAT] hash=-dimA1,3,2615,2816 -filtA64,3,3,3 1,64,2615,2816 -padA1,1 -convStrideA1,1 CUDNN_DATA_FLOAT
/home/ubuntu/torch/install/bin/luajit: /home/ubuntu/torch/install/share/lua/5.1/nn/Container.lua:67:
In 1 module of nn.Sequential:
In 1 module of nn.Sequential:
/home/ubuntu/torch/install/share/lua/5.1/cudnn/find.lua:94: Error in CuDNN: CUDNN_STATUS_NOT_SUPPORTED (cudnnConvolutionBackwardData)
stack traceback:
        [C]: in function 'error'
        /home/ubuntu/torch/install/share/lua/5.1/cudnn/find.lua:94: in function 'checkedCall'
        ...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:212: in function <...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:201>
        [C]: in function 'xpcall'
        /home/ubuntu/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
        /home/ubuntu/torch/install/share/lua/5.1/nn/Sequential.lua:58: in function </home/ubuntu/torch/install/share/lua/5.1/nn/Sequential.lua:50>
        [C]: in function 'pcall'
        /home/ubuntu/torch/install/share/lua/5.1/cutorch/init.lua:32: in function 'withDevice'
        /home/ubuntu/torch/install/share/lua/5.1/nn/GPU.lua:112: in function </home/ubuntu/torch/install/share/lua/5.1/nn/GPU.lua:108>
        [C]: in function 'xpcall'
        /home/ubuntu/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
        /home/ubuntu/torch/install/share/lua/5.1/nn/Sequential.lua:58: in function 'updateGradInput'
        neural_style.lua:284: in function 'opfunc'
        /home/ubuntu/torch/install/share/lua/5.1/optim/adam.lua:37: in function 'adam'
        neural_style.lua:307: in function 'main'
        neural_style.lua:601: in main chunk
        [C]: in function 'dofile'
        ...untu/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
        [C]: at 0x00405d50
@ProGamerGov ProGamerGov changed the title cudnnConvolutionBackwardData failed: 9 convDesc=[mode : CUDNN_CROSS_CORRELATION datatype : CUDNN_DATA_FLOAT] hash=-dimA1,3,2615,2816 -filtA64,3,3,3 1,64,2615,2816 -padA1,1 -convStrideA1,1 CUDNN_DATA_FLOAT cudnnConvolutionBackwardData failed - Error in CuDNN: CUDNN_STATUS_NOT_SUPPORTED (cudnnConvolutionBackwardData) Oct 25, 2017
@ProGamerGov
Copy link
Author

ProGamerGov commented Oct 25, 2017

I have been trying to push things as far as they can go, and may have hit a limit in Torch7 and/or cuDNN, because search engines don't really show anything for this error.

I was running the latest version of Torch, Ubuntu 16.04.3 LTS (GNU/Linux 4.4.0-1038-aws x86_64), and Cuda 9.0, with cuDNN v7.

@ProGamerGov
Copy link
Author

I assume this error is because of a limitation in the maximum value possible? So this maximum could be changed?

@ProGamerGov
Copy link
Author

The error appears to come from these areas:

In SpatialConvolution.lua, on line 201: https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua#L201

In SpatialConvolution.lua, on line 209: https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua#L209

@soumith How do I fix this limitation?

@ProGamerGov
Copy link
Author

@ProGamerGov
Copy link
Author

After using cudnn.verbose = true, it seems that it may be a lack of memory issue after all:

https://gist.github.com/ProGamerGov/9e5b367a90cd4be9cbd1ed023dafbb81

I thought I could go a lot higher in terms of image size in Neural-Style, but I did that one the install with an earlier version of Torch and Cuda/cuDNN. Either Torch7 or Cuda/cuDNN has gotten more inefficient, and that is probably why I can't get any higher in terms of image size: jcjohnson/neural-style#429

@ngimel
Copy link
Collaborator

ngimel commented Oct 31, 2017

Try limiting your workspace size by setting cudnn.maxWorkspaceGPUMemPercent (say, to 30 or 40)

@Kevinpsk
Copy link

Kevinpsk commented Dec 8, 2017

Hi guys, I was wondering if any of you has any progress on this problem. I have a similar error with cudnnConvolutionBackwardFilter. See below for the full error message,

cudnnConvolutionBackwardFilter failed: 9 convDesc=[mode : CUDNN_CROSS_CORRELATION datatype : CUDNN_DATA_FLOAT] hash=-dimA93700,3,20,9 -filtA10,3,9,9 93700,10,12,1 -padA0,0 -convStrideA1,1 CUDNN_DATA_FLOAT /usr/local/mnt/vega_scratch/scratch/bio_vad/src/torch/install/bin/luajit: ...bio_vad/src/torch/install/share/lua/5.1/nn/Container.lua:67: In 1 module of nn.Sequential: In 2 module of nn.Sequential: ...h/bio_vad/src/torch/install/share/lua/5.1/cudnn/find.lua:94: Error in CuDNN: CUDNN_STATUS_NOT_SUPPORTED (cudnnConvolutionBackwardFilter) stack traceback: [C]: in function 'error' ...h/bio_vad/src/torch/install/share/lua/5.1/cudnn/find.lua:94: in function 'checkedCall' ...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:264: in function 'accGradParameters' ...ch/bio_vad/src/torch/install/share/lua/5.1/nn/Module.lua:32: in function <...ch/bio_vad/src/torch/install/share/lua/5.1/nn/Module.lua:29> [C]: in function 'xpcall' ...bio_vad/src/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors' ...io_vad/src/torch/install/share/lua/5.1/nn/Sequential.lua:87: in function <...io_vad/src/torch/install/share/lua/5.1/nn/Sequential.lua:81> [C]: in function 'xpcall' ...bio_vad/src/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors' ...io_vad/src/torch/install/share/lua/5.1/nn/Sequential.lua:91: in function 'backward' ...ai/code/CLVTtorch/CLVT_SSF_Trainer/train_noSequencer.lua:106: in function 'opfunc' ...o_vad/src/torch/install/share/lua/5.1/optim/adadelta.lua:31: in function 'optimMethod' ...ai/code/CLVTtorch/CLVT_SSF_Trainer/train_noSequencer.lua:212: in main chunk [C]: in function 'dofile' ...ode/CLVTtorch/CLVT_SSF_Trainer/trainCLVT_noSequencer.lua:124: in main chunk [C]: in function 'dofile' .../src/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x004064f0
Is this a memory issue?

Cheers

@ChangshiFan
Copy link

@ProGamerGov Do you have solved this problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants