-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Synchronous SGD via layer-wise parallelism #2219
Commits on Mar 27, 2015
-
Configuration menu - View commit details
-
Copy full SHA for 4fe9305 - Browse repository at this point
Copy the full SHA 4fe9305View commit details -
forward declare instead of including boost/thread.hpp (BVLC#1009)
This means that Caffe::Get has to be moved to common.cpp, and loses its "inline" (but there are no real performance implications).
Configuration menu - View commit details
-
Copy full SHA for c4590db - Browse repository at this point
Copy the full SHA c4590dbView commit details -
Configuration menu - View commit details
-
Copy full SHA for 70ac334 - Browse repository at this point
Copy the full SHA 70ac334View commit details -
keep track of layer graph in Net
Instead of just keeping track of input and output blobs, also keep track of layer dependencies. (Also adjust AppendBottom's argument types to avoid passing an input as a pointer.)
Configuration menu - View commit details
-
Copy full SHA for 10ac0ff - Browse repository at this point
Copy the full SHA 10ac0ffView commit details -
This simplifies the OS X build, and will allow use of the per-thread default stream for running existing layer code asynchronously.
Configuration menu - View commit details
-
Copy full SHA for 6c2b0b5 - Browse repository at this point
Copy the full SHA 6c2b0b5View commit details -
[build] use CUDA 7's per thread default stream
Note that this may cause issues with code that assumes either explicit or device-level synchronization, which we'll fix in the next commit.
Configuration menu - View commit details
-
Copy full SHA for a67e216 - Browse repository at this point
Copy the full SHA a67e216View commit details -
always sync the default stream after GPU forward or backward
This ensures that layers are synchronous with respect to each other, even when layer code doesn't use explicit streams.
Configuration menu - View commit details
-
Copy full SHA for c7357b9 - Browse repository at this point
Copy the full SHA c7357b9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3ac616f - Browse repository at this point
Copy the full SHA 3ac616fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 832b273 - Browse repository at this point
Copy the full SHA 832b273View commit details -
always call Layer::Reshape in Layer::Forward
There are no cases where Forward is called without Reshape, so we can simplify the call structure.
Configuration menu - View commit details
-
Copy full SHA for 6a8525d - Browse repository at this point
Copy the full SHA 6a8525dView commit details -
Configuration menu - View commit details
-
Copy full SHA for e5cda03 - Browse repository at this point
Copy the full SHA e5cda03View commit details -
Configuration menu - View commit details
-
Copy full SHA for c70a21e - Browse repository at this point
Copy the full SHA c70a21eView commit details -
Configuration menu - View commit details
-
Copy full SHA for a44b3bf - Browse repository at this point
Copy the full SHA a44b3bfView commit details -
Configuration menu - View commit details
-
Copy full SHA for 404e61d - Browse repository at this point
Copy the full SHA 404e61dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 37edfd9 - Browse repository at this point
Copy the full SHA 37edfd9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 57b813c - Browse repository at this point
Copy the full SHA 57b813cView commit details -
expose boost::thread::interrupt as InternalThread::Interrupt
This will allow us to cleanly kill compute threads that are waiting for wark.
Configuration menu - View commit details
-
Copy full SHA for c3e247e - Browse repository at this point
Copy the full SHA c3e247eView commit details -
layers get device and thread_id
This gives us a way to specify layer-level execution placement for layerwise parallelism, implemented in future commits.
Configuration menu - View commit details
-
Copy full SHA for 1a79768 - Browse repository at this point
Copy the full SHA 1a79768View commit details -
split layer works across devices
Split layer gains a param, top_device, which allows tops to exist on different (explicitly specified) devices. Params are automatically copied and diffs are automatically accumulated. Because the implementation is now device-agnostic, it's done in (only) the *_cpu functions.
Configuration menu - View commit details
-
Copy full SHA for 84ef229 - Browse repository at this point
Copy the full SHA 84ef229View commit details -
split layers are automatically inserted between devices
This fills in the top_device param of split layer according to the device params of the connecting layers.
Configuration menu - View commit details
-
Copy full SHA for cf9bb2d - Browse repository at this point
Copy the full SHA cf9bb2dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 3e3a0eb - Browse repository at this point
Copy the full SHA 3e3a0ebView commit details -
Configuration menu - View commit details
-
Copy full SHA for 121b912 - Browse repository at this point
Copy the full SHA 121b912View commit details -
Net sets device before layer setup
This is necessary to ensure that buffers are allocated on the correct devices.
Configuration menu - View commit details
-
Copy full SHA for 29cdf53 - Browse repository at this point
Copy the full SHA 29cdf53View commit details -
Net gets a ComputeThread subclass for async forward/backward
Compute threads hold (blocking) queues of forward or backward commands, which are synchronized according to the layer graph through Net member variables.
Configuration menu - View commit details
-
Copy full SHA for f2839d5 - Browse repository at this point
Copy the full SHA f2839d5View commit details -
Configuration menu - View commit details
-
Copy full SHA for bf674ef - Browse repository at this point
Copy the full SHA bf674efView commit details -
Configuration menu - View commit details
-
Copy full SHA for f47298d - Browse repository at this point
Copy the full SHA f47298dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 47f74d6 - Browse repository at this point
Copy the full SHA 47f74d6View commit details -
[tools] caffe time performs initial Forward/Backward together
This fully exercises the multi-GPU case, and saves time.
Configuration menu - View commit details
-
Copy full SHA for ee45c17 - Browse repository at this point
Copy the full SHA ee45c17View commit details -
[tools] caffe time lets Net perform layer Forward/Backward
This is necessary to ensure that operations are performed on the correct device.
Configuration menu - View commit details
-
Copy full SHA for 069bdcd - Browse repository at this point
Copy the full SHA 069bdcdView commit details -
Configuration menu - View commit details
-
Copy full SHA for afb2ac1 - Browse repository at this point
Copy the full SHA afb2ac1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 31b2155 - Browse repository at this point
Copy the full SHA 31b2155View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4afb9bb - Browse repository at this point
Copy the full SHA 4afb9bbView commit details
Commits on Mar 28, 2015
-
Configuration menu - View commit details
-
Copy full SHA for eb4e7f8 - Browse repository at this point
Copy the full SHA eb4e7f8View commit details