Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some partial supports for READ-only inplaced operations (Re-based) #1103

Closed
wants to merge 47 commits into from

Conversation

zzsfornlp
Copy link
Contributor

This is a re-based PR of #929 basing on the current master branching, sorry that I have not been following the new commits and changes between that PR and the current point. Thus about this I might need some help for checking if there will be any logical conflicts. Thanks!

Here is some notes about this PR:

  1. Since NoBackprop and FlipGradient need special gradient manipulation, they have their own backward memories.
  2. For auto-batch mode, currently simply forbid batching for thoes nodes. Firstly the memory-allocation part of auto-batching seems a little complex for me, so I do not change much there in the fear of breaking sth; Secondly, in auto-batching mode, maybe memory is even less important.
  3. (TODO) For the memory saving of WRITE nodes, we might need some better mechanism for them.

zzsfornlp and others added 30 commits November 26, 2017 21:28
Sharing memories for Reshape(f/b), NoBackprop(f), FlipGradient(f).
* Fixed pernicious bug in autobatching

* Reverted to async and added stream
* Add NoneType assert for list arguments in _dynet.pyx.

* remove additional assert
* fix same_dims check for cmult fwd

* improved same_dims checks

* Removed error in affine transform arg check

* Some refactoring of broadcasting

* Better profiling

* Updates for cwise

* Updated broadcasts

* Fixed autobatch profile for csum
* Added more information in case of memory allocation error

* Added two more lines for clarification

* Added deviceID in cudaErrorMemoryAllocation error message
Trivial fix to make doxygen url work
some sbt versions ignored the options if they were in the wrong place
* Add pool memory info for profiling.

* update

* Add profile info when out of memory.

* move show_pool_mem_info out of CG class and add it into low-level control code.
akoehn and others added 17 commits December 19, 2017 10:59
* make {Nary, Unidirectional}TreeLSTMBuilder non-abstract again

Commit bf29c18 added set_h_impl to
RNNBuilder as an abstract method and the TreeLStmBuilders became
abstract.  This was fixed for the BidirectionalTreeLSTMBuilder but not
the other two.  This commit moves the the stub throwing a runtime error
from BidirectionalTreeLSTMBuilder to the TreeLSTMBuilder super class.

* TreeLSTMs: add nodes in arbitrary order; swig bindings

Main feature:
TreeLSTMs had to be constructed with the nodes in the tree added by
their index.  This is not practical if the nodes are already
sequentially ordered with the child nodes not being first, e.g. when
working with dependency trees.  If set_num_elements is called, a fixed
set of node space is reserved and nodes can be added in any order as
long as children are added first.

The old behavior still works as usual.

Also:
 - Add documentation to treelstm C++ code
 - add swig bindings
 - add scala wrapper for uni- and bidirectional tree lstm

* scala uni- and bidirectional tree lstm wrapper
* added complex structures page to docs

* renamed complex structures page
* Removed devices.h from .h files

* Made C++ compile

* Fix python

* Fixeed examples and tests

* Fixed compile on Linux and CUDA

* Fixed bug on Windows?

* Cleaned headers of examples
* Run Travis CI on release tags

* Create sdist before build to avoid unnecessary files

* Fix replacement in .appveyor.yml
* Enhanced implmentation of --dynet-gpu option in python end

* fix copy list bug
@neubig
Copy link
Contributor

neubig commented Dec 22, 2017

OK, I finally got around to taking a look. First, thanks again for contributing this!

I did a few changes to ensure that things don't break with autobatching. The biggest one is that the behavior of the nodes themselves are not modified, but forward or backward is just skipped in the executor. This is useful because now the executor is solely in charge of handling inplacing properly, and if inplacing is not supported things will work as they did before.

I'm going to do a few more tests and also make sure that the memory profiling code is working properly, then probably merge this.

@zzsfornlp
Copy link
Contributor Author

Thanks a lot for your help!

@neubig neubig closed this Jan 11, 2018
@neubig
Copy link
Contributor

neubig commented Jan 11, 2018

#1156 was merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.