Skip to content

Commit

Permalink
[docs] include cuDNN in installation and performance reference
Browse files Browse the repository at this point in the history
  • Loading branch information
shelhamer committed Sep 7, 2014
1 parent 490d9b8 commit 5a25c94
Show file tree
Hide file tree
Showing 2 changed files with 28 additions and 10 deletions.
18 changes: 12 additions & 6 deletions docs/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ We have installed Caffe on Ubuntu 14.04, Ubuntu 12.04, OS X 10.9, and OS X 10.8.

Caffe depends on several software packages.

* [CUDA](https://developer.nvidia.com/cuda-zone) library version 6.0, 5.5, or 5.0 and the latest driver version for CUDA 6 or 319.* for CUDA 5 (and NOT 331.*)
* [CUDA](https://developer.nvidia.com/cuda-zone) library version 6.5 (recommended), 6.0, 5.5, or 5.0 and the latest driver version for CUDA 6 or 319.* for CUDA 5 (and NOT 331.*)
* [BLAS](http://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms) (provided via ATLAS, MKL, or OpenBLAS).
* [OpenCV](http://opencv.org/).
* [Boost](http://www.boost.org/) (>= 1.55, although only 1.55 is tested)
Expand All @@ -25,13 +25,17 @@ Caffe depends on several software packages.
* For the MATLAB wrapper
* MATLAB with the `mex` compiler.

**CPU-only Caffe**: for cold-brewed CPU-only Caffe uncomment the `CPU_ONLY := 1` in `Makefile.config` to configure and build Caffe without CUDA. This is helpful for cloud or cluster deployment.
**cuDNN Caffe**: for fastest operation Caffe is accelerated by drop-in integration of [NVIDIA cuDNN](https://developer.nvidia.com/cudnn). To speed up your Caffe models, install cuDNN then uncomment the `USE_CUDNN := 1` flag in `Makefile.config` when installing Caffe. Acceleration is automatic.

**CPU-only Caffe**: for cold-brewed CPU-only Caffe uncomment the `CPU_ONLY := 1` flag in `Makefile.config` to configure and build Caffe without CUDA. This is helpful for cloud or cluster deployment.

### CUDA and BLAS

Caffe requires the CUDA `nvcc` compiler to compile its GPU code and CUDA driver for GPU operation.
To install CUDA, go to the [NVIDIA CUDA website](https://developer.nvidia.com/cuda-downloads) and follow installation instructions there. Install the library and the latest standalone driver separately; the driver bundled with the library is usually out-of-date. **Warning!** The 331.* CUDA driver series has a critical performance issue: do not use it.

For best performance, Caffe can be accelerated by [NVIDIA cuDNN](https://developer.nvidia.com/cudnn). Register for free at the cuDNN site, install it, then continue with these installation instructions. To compile with cuDNN set the `USE_CUDNN := 1` flag set in your `Makefile.config`.

Caffe requires BLAS as the backend of its matrix and vector computations.
There are several implementations of this library.
The choice is yours:
Expand Down Expand Up @@ -92,7 +96,7 @@ Keep reading to find out how to manually build and install the Google flags libr
On **CentOS / RHEL / Fedora**, most of the dependencies can be installed with

sudo yum install protobuf-devel leveldb-devel snappy-devel opencv-devel boost-devel hdf5-devel

The Google flags library, Google logging library and LMDB already made their ways into newer versions of **CentOS / RHEL / Fedora** so it is better to first attempt to install them using `yum`

sudo yum install gflags-devel glog-devel lmdb-devel
Expand Down Expand Up @@ -192,7 +196,7 @@ If you're not using Anaconda, include `hdf5` in the list above.
**Note** that in order to build the caffe python wrappers you must install boost using the --with-python option:

brew install --build-from-source --with-python --fresh -vd boost

**Note** that Homebrew maintains itself as a separate git repository and making the above `brew edit FORMULA` changes will change files in your local copy of homebrew's master branch. By default, this will prevent you from updating Homebrew using `brew update`, as you will get an error message like the following:

$ brew update
Expand All @@ -201,7 +205,7 @@ If you're not using Anaconda, include `hdf5` in the list above.
Please, commit your changes or stash them before you can merge.
Aborting
Error: Failure while executing: git pull -q origin refs/heads/master:refs/remotes/origin/master

One solution is to commit your changes to a separate Homebrew branch, run `brew update`, and rebase your changes onto the updated master, as follows:

cd /usr/local
Expand All @@ -213,7 +217,7 @@ One solution is to commit your changes to a separate Homebrew branch, run `brew
git rebase master caffe
# Resolve any merge conflicts here
git checkout caffe

At this point, you should be running the latest Homebrew packages and your Caffe-related modifications will remain in place. You may still get the following error:

$ brew update
Expand All @@ -240,6 +244,8 @@ The defaults should work, but uncomment the relevant lines if using Anaconda Pyt
make test
make runtest

To compile with cuDNN acceleration, you should uncomment the `USE_CUDNN := 1` switch in `Makefile.config`.

If there is no GPU in your machine, you should switch to CPU-only Caffe by uncommenting `CPU_ONLY := 1` in `Makefile.config`.

To compile the Python and MATLAB wrappers do `make pycaffe` and `make matcaffe` respectively.
Expand Down
20 changes: 16 additions & 4 deletions docs/performance_hardware.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: Performance and Hardware Configuration

# Performance and Hardware Configuration

To measure performance on different NVIDIA GPUs we use the Caffe reference ImageNet model.
To measure performance on different NVIDIA GPUs we use CaffeNet, the Caffe reference ImageNet model.

For training, each time point is 20 iterations/minibatches of 256 images for 5,120 images total. For testing, a 50,000 image validation set is classified.

Expand All @@ -14,11 +14,16 @@ For training, each time point is 20 iterations/minibatches of 256 images for 5,1

Performance is best with ECC off and boost clock enabled. While ECC makes a negligible difference in speed, disabling it frees ~1 GB of GPU memory.

Best settings with ECC off and maximum clock speed:
Best settings with ECC off and maximum clock speed in standard Caffe:

* Training is 26.5 secs / 20 iterations (5,120 images)
* Testing is 100 secs / validation set (50,000 images)

Best settings with Caffe + [cuDNN acceleration](http://nvidia.com/cudnn):

* Training is 19.2 secs / 20 iterations (5,120 images)
* Testing is 60.7 secs / validation set (50,000 images)

Other settings:

* ECC on, max speed: training 26.7 secs / 20 iterations, test 101 secs / validation set
Expand Down Expand Up @@ -50,12 +55,19 @@ but note that this configuration resets across driver reloading / rebooting. Inc
Training: 26.26 secs / 20 iterations (5,120 images).
Testing: 100 secs / validation set (50,000 images).

cuDNN Training: 20.25 secs / 20 iterations (5,120 images).
cuDNN Testing: 66.3 secs / validation set (50,000 images).


## NVIDIA K20

Training: 36.0 secs / 20 iterations (5,120 images).
Testing: 133 secs / validation set (50,000 images)
Testing: 133 secs / validation set (50,000 images).

## NVIDIA GTX 770

Training: 33.0 secs / 20 iterations (5,120 images).
Testing: 129 secs / validation set (50,000 images)
Testing: 129 secs / validation set (50,000 images).

cuDNN Training: 24.3 secs / 20 iterations (5,120 images).
cuDNN Testing: 104 secs / validation set (50,000 images).

0 comments on commit 5a25c94

Please sign in to comment.