New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Synchronous SGD via layer-wise parallelism #2219

Closed

longjon wants to merge 33 commits into BVLC:master from longjon:sync

Commits on Mar 27, 2015

thread specific singleton

cypof authored and longjon committed Mar 27, 2015
Configuration menu
View commit details

Copy full SHA for 4fe9305

Browse repository at this point
Copy the full SHA

4fe9305 View commit details

Browse the repository at this point in the history
forward declare instead of including boost/thread.hpp (BVLC#1009 )
```
This means that Caffe::Get has to be moved to common.cpp, and loses its
"inline" (but there are no real performance implications).
```
longjon committed Mar 27, 2015
Configuration menu
View commit details

Copy full SHA for c4590db

Browse repository at this point
Copy the full SHA

c4590db View commit details

Browse the repository at this point in the history
add parameter layer for learning any bottom

longjon committed Mar 27, 2015
Configuration menu
View commit details

Copy full SHA for 70ac334

Browse repository at this point
Copy the full SHA

70ac334 View commit details

Browse the repository at this point in the history
keep track of layer graph in Net
```
Instead of just keeping track of input and output blobs, also keep track
of layer dependencies. (Also adjust AppendBottom's argument types to
avoid passing an input as a pointer.)
```
longjon committed Mar 27, 2015
Configuration menu
View commit details

Copy full SHA for 10ac0ff

Browse repository at this point
Copy the full SHA

10ac0ff View commit details

Browse the repository at this point in the history
require CUDA 7
```
This simplifies the OS X build, and will allow use of the per-thread
default stream for running existing layer code asynchronously.
```
longjon committed Mar 27, 2015
Configuration menu
View commit details

Copy full SHA for 6c2b0b5

Browse repository at this point
Copy the full SHA

6c2b0b5 View commit details

Browse the repository at this point in the history
[build] use CUDA 7's per thread default stream
```
Note that this may cause issues with code that assumes either explicit
or device-level synchronization, which we'll fix in the next commit.
```
longjon committed Mar 27, 2015
Configuration menu
View commit details

Copy full SHA for a67e216

Browse repository at this point
Copy the full SHA

a67e216 View commit details

Browse the repository at this point in the history
always sync the default stream after GPU forward or backward
```
This ensures that layers are synchronous with respect to each other,
even when layer code doesn't use explicit streams.
```
longjon committed Mar 27, 2015
Configuration menu
View commit details

Copy full SHA for c7357b9

Browse repository at this point
Copy the full SHA

c7357b9 View commit details

Browse the repository at this point in the history
use per-thread stream as default for cuDNN

longjon committed Mar 27, 2015
Configuration menu
View commit details

Copy full SHA for 3ac616f

Browse repository at this point
Copy the full SHA

3ac616f View commit details

Browse the repository at this point in the history
use per-thread stream as default for cuBLAS

longjon committed Mar 27, 2015
Configuration menu
View commit details

Copy full SHA for 832b273

Browse repository at this point
Copy the full SHA

832b273 View commit details

Browse the repository at this point in the history
always call Layer::Reshape in Layer::Forward
```
There are no cases where Forward is called without Reshape, so we can
simplify the call structure.
```
longjon committed Mar 27, 2015
Configuration menu
View commit details

Copy full SHA for 6a8525d

Browse repository at this point
Copy the full SHA

6a8525d View commit details

Browse the repository at this point in the history
add param_bottoms option to take parameters from bottoms

longjon committed Mar 27, 2015
Configuration menu
View commit details

Copy full SHA for e5cda03

Browse repository at this point
Copy the full SHA

e5cda03 View commit details

Browse the repository at this point in the history
remove spurious net.hpp includes

longjon committed Mar 27, 2015
Configuration menu
View commit details

Copy full SHA for c70a21e

Browse repository at this point
Copy the full SHA

c70a21e View commit details

Browse the repository at this point in the history
device-parametrized blob accessors

longjon committed Mar 27, 2015
Configuration menu
View commit details

Copy full SHA for a44b3bf

Browse repository at this point
Copy the full SHA

a44b3bf View commit details

Browse the repository at this point in the history
add blocking queue for synchronous things

cypof authored and longjon committed Mar 27, 2015
Configuration menu
View commit details

Copy full SHA for 404e61d

Browse repository at this point
Copy the full SHA

404e61d View commit details

Browse the repository at this point in the history
simplify blocking queue

longjon committed Mar 27, 2015
Configuration menu
View commit details

Copy full SHA for 37edfd9

Browse repository at this point
Copy the full SHA

37edfd9 View commit details

Browse the repository at this point in the history
add blocking_queue::wait_for_empty for blocking until done

longjon committed Mar 27, 2015
Configuration menu
View commit details

Copy full SHA for 57b813c

Browse repository at this point
Copy the full SHA

57b813c View commit details

Browse the repository at this point in the history
expose boost::thread::interrupt as InternalThread::Interrupt
```
This will allow us to cleanly kill compute threads that are waiting for
wark.
```
longjon committed Mar 27, 2015
Configuration menu
View commit details

Copy full SHA for c3e247e

Browse repository at this point
Copy the full SHA

c3e247e View commit details

Browse the repository at this point in the history
layers get device and thread_id
```
This gives us a way to specify layer-level execution placement for
layerwise parallelism, implemented in future commits.
```
longjon committed Mar 27, 2015
Configuration menu
View commit details

Copy full SHA for 1a79768

Browse repository at this point
Copy the full SHA

1a79768 View commit details

Browse the repository at this point in the history

split layer works across devices

Split layer gains a param, top_device, which allows tops to exist on
different (explicitly specified) devices. Params are automatically
copied and diffs are automatically accumulated. Because the
implementation is now device-agnostic, it's done in (only) the *_cpu
functions.

longjon committed Mar 27, 2015

84ef229

split layers are automatically inserted between devices
```
This fills in the top_device param of split layer according to the
device params of the connecting layers.
```
longjon committed Mar 27, 2015
Configuration menu
View commit details

Copy full SHA for cf9bb2d

Browse repository at this point
Copy the full SHA

cf9bb2d View commit details

Browse the repository at this point in the history
add a Caffe::SetDevice overload that takes DeviceParameter

longjon committed Mar 27, 2015
Configuration menu
View commit details

Copy full SHA for 3e3a0eb

Browse repository at this point
Copy the full SHA

3e3a0eb View commit details

Browse the repository at this point in the history
[pycaffe] explicitly specify SetDevice overload

longjon committed Mar 27, 2015
Configuration menu
View commit details

Copy full SHA for 121b912

Browse repository at this point
Copy the full SHA

121b912 View commit details

Browse the repository at this point in the history
Net sets device before layer setup
```
This is necessary to ensure that buffers are allocated on the correct
devices.
```
longjon committed Mar 27, 2015
Configuration menu
View commit details

Copy full SHA for 29cdf53

Browse repository at this point
Copy the full SHA

29cdf53 View commit details

Browse the repository at this point in the history
Net gets a ComputeThread subclass for async forward/backward
```
Compute threads hold (blocking) queues of forward or backward commands,
which are synchronized according to the layer graph through Net member
variables.
```
longjon committed Mar 27, 2015
Configuration menu
View commit details

Copy full SHA for f2839d5

Browse repository at this point
Copy the full SHA

f2839d5 View commit details

Browse the repository at this point in the history
Net creates threads and maps logical -> physical device ids

longjon committed Mar 27, 2015
Configuration menu
View commit details

Copy full SHA for bf674ef

Browse repository at this point
Copy the full SHA

bf674ef View commit details

Browse the repository at this point in the history
Net uses threads for forward/backward

longjon committed Mar 27, 2015
Configuration menu
View commit details

Copy full SHA for f47298d

Browse repository at this point
Copy the full SHA

f47298d View commit details

Browse the repository at this point in the history
enable P2P access

longjon committed Mar 27, 2015
Configuration menu
View commit details

Copy full SHA for 47f74d6

Browse repository at this point
Copy the full SHA

47f74d6 View commit details

Browse the repository at this point in the history
[tools] caffe time performs initial Forward/Backward together
```
This fully exercises the multi-GPU case, and saves time.
```
longjon committed Mar 27, 2015
Configuration menu
View commit details

Copy full SHA for ee45c17

Browse repository at this point
Copy the full SHA

ee45c17 View commit details

Browse the repository at this point in the history
[tools] caffe time lets Net perform layer Forward/Backward
```
This is necessary to ensure that operations are performed on the correct
device.
```
longjon committed Mar 27, 2015
Configuration menu
View commit details

Copy full SHA for 069bdcd

Browse repository at this point
Copy the full SHA

069bdcd View commit details

Browse the repository at this point in the history
optional NVTX instrumentation for forward/backward

longjon committed Mar 27, 2015
Configuration menu
View commit details

Copy full SHA for afb2ac1

Browse repository at this point
Copy the full SHA

afb2ac1 View commit details

Browse the repository at this point in the history
[tools] caffe time option for overall/multi-device forward/backward

longjon committed Mar 27, 2015
Configuration menu
View commit details

Copy full SHA for 31b2155

Browse repository at this point
Copy the full SHA

31b2155 View commit details

Browse the repository at this point in the history
cudnn conv: properly sync streams

longjon committed Mar 27, 2015
Configuration menu
View commit details

Copy full SHA for 4afb9bb

Browse repository at this point
Copy the full SHA

4afb9bb View commit details

Browse the repository at this point in the history

Commits on Mar 28, 2015

[examples] multi-GPU examples

longjon committed Mar 28, 2015
Configuration menu
View commit details

Copy full SHA for eb4e7f8

Browse repository at this point
Copy the full SHA

eb4e7f8 View commit details

Browse the repository at this point in the history

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Synchronous SGD via layer-wise parallelism #2219

Synchronous SGD via layer-wise parallelism #2219

Commits on Mar 27, 2015

Commits on Mar 28, 2015