Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synchronous SGD via layer-wise parallelism #2219

Closed
wants to merge 33 commits into from
Closed

Commits on Mar 27, 2015

  1. thread specific singleton

    cypof authored and longjon committed Mar 27, 2015
    Configuration menu
    Copy the full SHA
    4fe9305 View commit details
    Browse the repository at this point in the history
  2. forward declare instead of including boost/thread.hpp (BVLC#1009)

    This means that Caffe::Get has to be moved to common.cpp, and loses its
    "inline" (but there are no real performance implications).
    longjon committed Mar 27, 2015
    Configuration menu
    Copy the full SHA
    c4590db View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    70ac334 View commit details
    Browse the repository at this point in the history
  4. keep track of layer graph in Net

    Instead of just keeping track of input and output blobs, also keep track
    of layer dependencies. (Also adjust AppendBottom's argument types to
    avoid passing an input as a pointer.)
    longjon committed Mar 27, 2015
    Configuration menu
    Copy the full SHA
    10ac0ff View commit details
    Browse the repository at this point in the history
  5. require CUDA 7

    This simplifies the OS X build, and will allow use of the per-thread
    default stream for running existing layer code asynchronously.
    longjon committed Mar 27, 2015
    Configuration menu
    Copy the full SHA
    6c2b0b5 View commit details
    Browse the repository at this point in the history
  6. [build] use CUDA 7's per thread default stream

    Note that this may cause issues with code that assumes either explicit
    or device-level synchronization, which we'll fix in the next commit.
    longjon committed Mar 27, 2015
    Configuration menu
    Copy the full SHA
    a67e216 View commit details
    Browse the repository at this point in the history
  7. always sync the default stream after GPU forward or backward

    This ensures that layers are synchronous with respect to each other,
    even when layer code doesn't use explicit streams.
    longjon committed Mar 27, 2015
    Configuration menu
    Copy the full SHA
    c7357b9 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    3ac616f View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    832b273 View commit details
    Browse the repository at this point in the history
  10. always call Layer::Reshape in Layer::Forward

    There are no cases where Forward is called without Reshape, so we can
    simplify the call structure.
    longjon committed Mar 27, 2015
    Configuration menu
    Copy the full SHA
    6a8525d View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    e5cda03 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    c70a21e View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    a44b3bf View commit details
    Browse the repository at this point in the history
  14. add blocking queue for synchronous things

    cypof authored and longjon committed Mar 27, 2015
    Configuration menu
    Copy the full SHA
    404e61d View commit details
    Browse the repository at this point in the history
  15. simplify blocking queue

    longjon committed Mar 27, 2015
    Configuration menu
    Copy the full SHA
    37edfd9 View commit details
    Browse the repository at this point in the history
  16. Configuration menu
    Copy the full SHA
    57b813c View commit details
    Browse the repository at this point in the history
  17. expose boost::thread::interrupt as InternalThread::Interrupt

    This will allow us to cleanly kill compute threads that are waiting for
    wark.
    longjon committed Mar 27, 2015
    Configuration menu
    Copy the full SHA
    c3e247e View commit details
    Browse the repository at this point in the history
  18. layers get device and thread_id

    This gives us a way to specify layer-level execution placement for
    layerwise parallelism, implemented in future commits.
    longjon committed Mar 27, 2015
    Configuration menu
    Copy the full SHA
    1a79768 View commit details
    Browse the repository at this point in the history
  19. split layer works across devices

    Split layer gains a param, top_device, which allows tops to exist on
    different (explicitly specified) devices. Params are automatically
    copied and diffs are automatically accumulated. Because the
    implementation is now device-agnostic, it's done in (only) the *_cpu
    functions.
    longjon committed Mar 27, 2015
    Configuration menu
    Copy the full SHA
    84ef229 View commit details
    Browse the repository at this point in the history
  20. split layers are automatically inserted between devices

    This fills in the top_device param of split layer according to the
    device params of the connecting layers.
    longjon committed Mar 27, 2015
    Configuration menu
    Copy the full SHA
    cf9bb2d View commit details
    Browse the repository at this point in the history
  21. Configuration menu
    Copy the full SHA
    3e3a0eb View commit details
    Browse the repository at this point in the history
  22. Configuration menu
    Copy the full SHA
    121b912 View commit details
    Browse the repository at this point in the history
  23. Net sets device before layer setup

    This is necessary to ensure that buffers are allocated on the correct
    devices.
    longjon committed Mar 27, 2015
    Configuration menu
    Copy the full SHA
    29cdf53 View commit details
    Browse the repository at this point in the history
  24. Net gets a ComputeThread subclass for async forward/backward

    Compute threads hold (blocking) queues of forward or backward commands,
    which are synchronized according to the layer graph through Net member
    variables.
    longjon committed Mar 27, 2015
    Configuration menu
    Copy the full SHA
    f2839d5 View commit details
    Browse the repository at this point in the history
  25. Configuration menu
    Copy the full SHA
    bf674ef View commit details
    Browse the repository at this point in the history
  26. Configuration menu
    Copy the full SHA
    f47298d View commit details
    Browse the repository at this point in the history
  27. enable P2P access

    longjon committed Mar 27, 2015
    Configuration menu
    Copy the full SHA
    47f74d6 View commit details
    Browse the repository at this point in the history
  28. [tools] caffe time performs initial Forward/Backward together

    This fully exercises the multi-GPU case, and saves time.
    longjon committed Mar 27, 2015
    Configuration menu
    Copy the full SHA
    ee45c17 View commit details
    Browse the repository at this point in the history
  29. [tools] caffe time lets Net perform layer Forward/Backward

    This is necessary to ensure that operations are performed on the correct
    device.
    longjon committed Mar 27, 2015
    Configuration menu
    Copy the full SHA
    069bdcd View commit details
    Browse the repository at this point in the history
  30. Configuration menu
    Copy the full SHA
    afb2ac1 View commit details
    Browse the repository at this point in the history
  31. Configuration menu
    Copy the full SHA
    31b2155 View commit details
    Browse the repository at this point in the history
  32. Configuration menu
    Copy the full SHA
    4afb9bb View commit details
    Browse the repository at this point in the history

Commits on Mar 28, 2015

  1. [examples] multi-GPU examples

    longjon committed Mar 28, 2015
    Configuration menu
    Copy the full SHA
    eb4e7f8 View commit details
    Browse the repository at this point in the history