Multi-GPU caffe trainning is slow with CuDNN #4901

junshi15 · 2016-10-25T20:57:16Z

I compared 8-gpu caffe training with and without CuDNN. Surprisingly, CuDNN reduces training speed. I was wondering if anybody has seen this.

Here are some details:
OS: RHEL 6.5
CUDA: 7.5
CUDNN: 5.1
GPUs: 8 Telsa-K80
Caffe model: caffenet reference model
Data set: ImageNet.

Speed:
1-gpu with cudnn = 1.7 X 1-gpu without cudnn.
8-gpu with cudnn = 0.86 X 8-gpu without cudnn.

I can provide more information if needed.

shelhamer · 2017-04-12T01:58:42Z

I don't know what could have caused this, but now that parallelism is handled by multiple processes there should not be an interaction with the number of GPUs and however each process is doing the computation other than the usual considerations of computation/communication when scaling across multiple GPUs. Please follow-up if this is still an issue after #4563

shelhamer closed this as completed Apr 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-GPU caffe trainning is slow with CuDNN #4901

Multi-GPU caffe trainning is slow with CuDNN #4901

junshi15 commented Oct 25, 2016

shelhamer commented Apr 12, 2017 •

edited

Loading

Multi-GPU caffe trainning is slow with CuDNN #4901

Multi-GPU caffe trainning is slow with CuDNN #4901

Comments

junshi15 commented Oct 25, 2016

shelhamer commented Apr 12, 2017 • edited Loading

shelhamer commented Apr 12, 2017 •

edited

Loading