cudnnFindConvolutionForwardAlgorithm failed #22

drodo · 2017-02-01T16:36:21Z

Hey,

I'm getting the above error a couple of seconds after the first training epoch starts:

nClasses:   1000                                                                
nTest:  50000                                                                   
==> doing epoch on training data:                                               
==> online epoch # 1                                                            

cudnnFindConvolutionForwardAlgorithm failed:    2    convDesc=[mode : CUDNN_CROSS_CORRELATION datatype : CUDNN_DATA_FLOAT] hash=-dimA800,3,224,224 -filtA96,3,11,11 800,96,55,55 -padA2,2 -convStrideA4,4 CUDNN_DATA_FLOAT
/home/drodo/torch/install/bin/luajit: /home/drodo/torch/install/share/lua/5.1/threads/threads.lua:179: [thread 2 endcallback] /home/drodo/torch/install/share/lua/5.1/nn/Container.lua:67:
In 1 module of nn.Sequential:                                                   
/home/drodo/torch/install/share/lua/5.1/cudnn/find.lua:483: cudnnFindConvolutionForwardAlgorithm failed, sizes:  convDesc=[mode : CUDNN_CROSS_CORRELATION datatype : CUDNN_DATA_FLOAT] hash=-dimA800,3,224,224 -filtA96,3,11,11 800,96,55,55 -padA2,2 -convStrideA4,4 CUDN
stack traceback:                                                                
[C]: in function 'error'                                                    
/home/drodo/torch/install/share/lua/5.1/cudnn/find.lua:483: in function 'forwardAlgorithm'
...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:190: in function <...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:186>
[C]: in function 'xpcall'                                                   
/home/drodo/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/home/drodo/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
/home/drodo/xnornet/XNOR-Net/train.lua:176: in function </home/drodo/xnornet/XNOR-Net/train.lua:157>
[C]: in function 'xpcall'                                                   
/home/drodo/torch/install/share/lua/5.1/threads/threads.lua:174: in function 'dojob'
/home/drodo/torch/install/share/lua/5.1/threads/threads.lua:223: in function 'addjob'
/home/drodo/xnornet/XNOR-Net/train.lua:108: in function 'train'             
main.lua:50: in main chunk                                                  
[C]: in function 'dofile'                                                   
...rodo/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00406670

Data set is prepared exactly as indicated in the README.md and cuda config is also operational. Has anyone ever come across a similar error running this ConvNet?

Cheers & Thanks,

--
Dimitrios

The text was updated successfully, but these errors were encountered:

kochie · 2017-04-13T04:29:49Z

Did someone find a solution?

drodo · 2017-04-14T18:52:16Z

Relates to GPU memory being full. Try with a smaller batch size.

drodo closed this as completed Mar 2, 2017

ghost mentioned this issue Jul 1, 2017

Error training! facebookarchive/fb.resnet.torch#153

Open

ProGamerGov mentioned this issue Oct 31, 2017

cudnnConvolutionBackwardData failed - Error in CuDNN: CUDNN_STATUS_NOT_SUPPORTED (cudnnConvolutionBackwardData) soumith/cudnn.torch#384

Open

XingangPan mentioned this issue Aug 10, 2018

training error---cudnnFindConvolutionForwardAlgorithm failed XingangPan/SCNN#51

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cudnnFindConvolutionForwardAlgorithm failed #22

cudnnFindConvolutionForwardAlgorithm failed #22

drodo commented Feb 1, 2017 •

edited

Loading

kochie commented Apr 13, 2017

drodo commented Apr 14, 2017

cudnnFindConvolutionForwardAlgorithm failed #22

cudnnFindConvolutionForwardAlgorithm failed #22

Comments

drodo commented Feb 1, 2017 • edited Loading

kochie commented Apr 13, 2017

drodo commented Apr 14, 2017

drodo commented Feb 1, 2017 •

edited

Loading