LSTM node #729

msperber · 2017-07-21T22:54:47Z

This attempts to reduce the per-timestep LSTM memory consumption to a minimum, by avoiding creation of intermediate nodes. The memory allocated is now 6*hidden_dim for forward & backward pass respectively: hidden state, cell state, and the result of the 4 matrix multiplies + nonlinearities. In an ideal case we wouldn’t have to save the latter in the backward pass, but due to the complex LSTM dependencies I don’t see a good way to accomplish that without hardcoding the complete loop over the sequence, which would make things very inflexible.

Implemented is the vanilla LSTM, dropout, and weight noise. Dropout and weight noise require no additional memory. Preliminary experiments indicate smaller overall memory footprint at slightly slower speed.

The above is achieved via 3 new dynet nodes:

gates_t = vanilla_lstm_gates(x_t, h_tm1, Wx, Wh, b, dropout_mask_x, dropout_mask_h, weightnoise_std)

c_t = vanilla_lstm_c(c_tm1, gates_t)

h_t = vanilla_lstm_h(c_t, gates_t)

…es without error

Moved LSTM node to it's own source files

Fix several compile errors and extra scratch memory

jcyk · 2017-07-22T10:42:32Z

Great! Excited about it!
@msperber What is the memory allocation in current version?

msperber · 2017-07-22T11:15:41Z

If I counted correctly, it should currently be at least 18hidden_dim, and 20hidden_dim if using dropout.

pmichel31415 · 2017-07-22T14:11:32Z

dynet/nodes-lstm.cc

+    }
+
+    // non-linearities
+    fx.tbvec().slice(indices_i, sizes_3).device(*dev.edevice) = fx.tbvec().slice(indices_i, sizes_3).unaryExpr(scalar_logistic_sigmoid_op<float>());


Very minor comment: I think you can do .sigmoid() here

pmichel31415 · 2017-07-22T14:12:23Z

Thanks matthias!

I have a quick question: any guess as to why is it slower? is this fixable?

msperber · 2017-07-22T18:04:00Z

Hmm, I don't see a particular reason why it should be slower, so yeah I assume that should be just a matter of doing some more profiling / code-level optimization. If anything we could probably make it slightly faster because we have more opportunity for batchings things together, such as the 3 sigmoids.

msperber · 2017-07-22T20:56:32Z

Actually, now that you made me think about it, Paul: A likely reason is that the i,f,o,g parts are no longer separated in memory, and so the pointwise operations are performed by striding over the memory. If that slows things down, it should be easy to fix by copying them to separate regions of scratch memory (same as the currently implemented LSTM is doing, except we release the memory afterwards).
In any case, I guess we can save this for a future commit as I won't have time to look into this for the next couple of weeks, and speed differences are not super big anyways.

neubig · 2017-07-24T10:46:40Z

Thanks, this is great! I'm going to merge this for now, and we can take a look at the remaining speed issues in the future.

* fixed file paths * Apply changes in #547 and #729 to Scala bindings

msperber added 30 commits July 10, 2017 18:33

first stab at lstm node, forward pass

4e8ceb2

added comment

222b24e

LSTM forward: memory layout of h/c vector + restructuring/fixes

c4c654e

bugfix + prepared unit test

7b6d413

summarized 4 matrix multiplication into one operation; forward comput…

e485905

…es without error

forward tests passing

80e2d8e

lstm node forward pass working with minibatches

fc55106

lstm 3-part node: forward working & tested

48732f3

backward pass for vanilla_lstm_h and vanilla_lstm_c

b368062

lstm-gates: parts of backward pass

10c6a46

some more missing parts

9efa45d

removed old code, cleaned up tests

0cc6379

vanilla_lstm_c: backward passing test

115211a

some testing / fixing

f03db7f

some comments

9b1f2b4

Merge branch 'master' into lstm-node

ab1ea1a

Moved LSTM node to it's own source files

CPU matrix multiply

fed3f30

re-added LSTM node

9688329

lstm gates fwd passing test

b217ce0

tests

a09b713

lstm_h backward passing gradient checks

6ccefe4

lstm gates backward: replaced contraction by matrix multiply

59235b2

lstm gates bwd implemented but not yet passing tests

f5ab4bf

fixed math in lstm_h backward

49fabe6

fixed the same math error in lstm_gates bwd

eefbe47

improved tests; all tests passing

f050747

unified CPUMatrixMultiply / CUDAMatrixMultiply -> MatrixMultiply

2d8c184

python interface for LSTM node

5582778

changed use of slice for potential speed improvement

7a915a8

speed-up: replace .sum() by manual summation over batches

c4f2ac3

msperber and others added 15 commits July 20, 2017 17:55

speed-up by replacing shuffle with reshape

6b37e7b

speed by replacing shuffle with transpose

2093f77

marked places that need speed-up

e93b6d4

speed up for outer product

6e516cc

Some fixes for GPU

e6b90c5

Comment out extra scratch memory

3e54f0b

Merge pull request #1 from neubig/lstm-node

3d1bfe3

Fix several compile errors and extra scratch memory

add CompactVanillaLSTMBuilder

182fb08

added weight norm to vanilla_lstm_gates

cc3739a

updated doc & python interface

fbd632a

add weight noise to CompactVanillaLSTMBuilder

1132bac

added missing free() of scratch allocator

6cd6ccc

initial code to integrate dropout into lstm nodes

e485713

integrated dropout into LSTM node

14d0af5

removed unused variables

8648640

pmichel31415 reviewed Jul 22, 2017

View reviewed changes

neubig merged commit c2fefdf into clab:master Jul 24, 2017

shuheik added a commit to shuheik/dynet that referenced this pull request Sep 14, 2017

Apply changes in clab#547 and clab#729 to Scala bindings

0ada420

shuheik mentioned this pull request Sep 14, 2017

RNN-related enhancements to Scala bindings #895

Merged

neubig pushed a commit that referenced this pull request Sep 14, 2017

RNN-related enhancements to Scala bindings (#895)

a3782a1

* fixed file paths * Apply changes in #547 and #729 to Scala bindings

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LSTM node #729

LSTM node #729

msperber commented Jul 21, 2017 •

edited by yoavg

Loading

jcyk commented Jul 22, 2017

msperber commented Jul 22, 2017

pmichel31415 Jul 22, 2017

pmichel31415 commented Jul 22, 2017

msperber commented Jul 22, 2017

msperber commented Jul 22, 2017

neubig commented Jul 24, 2017

LSTM node #729

LSTM node #729

Conversation

msperber commented Jul 21, 2017 • edited by yoavg Loading

jcyk commented Jul 22, 2017

msperber commented Jul 22, 2017

pmichel31415 Jul 22, 2017

Choose a reason for hiding this comment

pmichel31415 commented Jul 22, 2017

msperber commented Jul 22, 2017

msperber commented Jul 22, 2017

neubig commented Jul 24, 2017

msperber commented Jul 21, 2017 •

edited by yoavg

Loading