Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cudnnFindConvolutionForwardAlgorithm failed #22

Closed
drodo opened this issue Feb 1, 2017 · 2 comments
Closed

cudnnFindConvolutionForwardAlgorithm failed #22

drodo opened this issue Feb 1, 2017 · 2 comments

Comments

@drodo
Copy link

drodo commented Feb 1, 2017

Hey,

I'm getting the above error a couple of seconds after the first training epoch starts:

nClasses:   1000                                                                
nTest:  50000                                                                   
==> doing epoch on training data:                                               
==> online epoch # 1                                                            

cudnnFindConvolutionForwardAlgorithm failed:    2    convDesc=[mode : CUDNN_CROSS_CORRELATION datatype : CUDNN_DATA_FLOAT] hash=-dimA800,3,224,224 -filtA96,3,11,11 800,96,55,55 -padA2,2 -convStrideA4,4 CUDNN_DATA_FLOAT
/home/drodo/torch/install/bin/luajit: /home/drodo/torch/install/share/lua/5.1/threads/threads.lua:179: [thread 2 endcallback] /home/drodo/torch/install/share/lua/5.1/nn/Container.lua:67:
In 1 module of nn.Sequential:                                                   
/home/drodo/torch/install/share/lua/5.1/cudnn/find.lua:483: cudnnFindConvolutionForwardAlgorithm failed, sizes:  convDesc=[mode : CUDNN_CROSS_CORRELATION datatype : CUDNN_DATA_FLOAT] hash=-dimA800,3,224,224 -filtA96,3,11,11 800,96,55,55 -padA2,2 -convStrideA4,4 CUDN
stack traceback:                                                                
[C]: in function 'error'                                                    
/home/drodo/torch/install/share/lua/5.1/cudnn/find.lua:483: in function 'forwardAlgorithm'
...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:190: in function <...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:186>
[C]: in function 'xpcall'                                                   
/home/drodo/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/home/drodo/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
/home/drodo/xnornet/XNOR-Net/train.lua:176: in function </home/drodo/xnornet/XNOR-Net/train.lua:157>
[C]: in function 'xpcall'                                                   
/home/drodo/torch/install/share/lua/5.1/threads/threads.lua:174: in function 'dojob'
/home/drodo/torch/install/share/lua/5.1/threads/threads.lua:223: in function 'addjob'
/home/drodo/xnornet/XNOR-Net/train.lua:108: in function 'train'             
main.lua:50: in main chunk                                                  
[C]: in function 'dofile'                                                   
...rodo/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00406670   

Data set is prepared exactly as indicated in the README.md and cuda config is also operational. Has anyone ever come across a similar error running this ConvNet?

Cheers & Thanks,

--
Dimitrios

@drodo drodo closed this as completed Mar 2, 2017
@kochie
Copy link

kochie commented Apr 13, 2017

Did someone find a solution?

@drodo
Copy link
Author

drodo commented Apr 14, 2017

Relates to GPU memory being full. Try with a smaller batch size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants