Some question about the code and training time. #4

sevenights · 2019-07-04T01:44:39Z

Thanks for you paper and code.

But I'm confused with some code.

In src/model.py

function get_score() -> function inner_one_step()

update_gate = T.exp(T.dot(ugW[:ln+nhiddens,an+ln-nhiddens:an+ln+ln],com)+ugb[an+ln-nhiddens:an+ln+ln]).reshape((len+1,nhiddens))

and in src/tools.py

function get_word()

update_gate = np.exp(np.dot(ugW[:ln+ndims,an+ln-ndims:an+ln+ln],com)+ugb[an+ln-ndims:an+ln+ln]).reshape((len+1,ndims))

I think they should be

update_gate = T.exp(T.dot(ugW[:ln+nhiddens,an+ln-nhiddens:an+ln+ln],com)+ugb[an+ln-nhiddens:an+ln+ln]).reshape((nhiddens, len+1)).transpose()

and

update_gate = np.exp(np.dot(ugW[:ln+ndims,an+ln-ndims:an+ln+ln],com)+ugb[an+ln-ndims:an+ln+ln]).reshape((ndims, len+1)).transpose()

Since the code

np.exp(np.dot(ugW[:ln+ndims,an+ln-ndims:an+ln+ln],com)+ugb[an+ln-ndims:an+ln+ln])

represent a vector

[e1_1, e1_2, ..., e1_ndims, e2_1, e2_2, ...,eln_1, eln_2, ..., eln_ndim]

where ei_j derived from the jth elem of ith character.
So the reshape((nhiddens, len+1)) reshape the vector to

[e1_1, e1_2, ..., e1_ln,
e1_ln+1, e1_ln+2,...e1_ln+ln
...
eln_ndim-ln+1, ..., eln_ndim-1, eln_ndim]

which may supposed to be

[e1_1, e1_2, ..., e1_ndim,
e2_1, e2_2, ..., e2_ndim,
...
eln_1, eln_2, ..., eln_ndim]

In order to see the difference, I revised the code and ran it on CPU. But The memory occupation raised from 3GB(epoch 1) to 30GB(epoch 8). And it took 8000s per epoch.

The original code took 7000s per epoch and 2.6GB.

Did I misunderstand something?

Thanks!

The text was updated successfully, but these errors were encountered:

sevenights · 2019-07-04T01:48:01Z

I don't know if I have expressed it clearly. And my chinese is much better than English :D

jcyk · 2019-07-04T02:00:42Z

@sevenights
Hi,

I treat Theano as a legacy in deep learning and am not familiar with it anymore.

I would refer you to a better implementation there. It is with a modern library, i.e. dynet.

sevenights · 2019-07-04T15:20:11Z

But in the greedyCWS, there is no need to calculate the update gate for word representation. In the paper

an update gate z (As in Figure 2), which has been shown helpless to the performance but requires uge computational cost according toour empirical study.

So I read the dy_model.py and get the same question.

update_gate = dy.transpose(dy.concatenate_cols([dy.softmax(dy.pickrange(update_logits,i*(wlen+1),(i+1)*(wlen+1))) for i in xrange(self.options['ndims'])]))

which I thought to be:

update_gate = dy.concatenate_cols([dy.softmax(dy.pickrange(update_logits,i*(self.options['ndims']+1),(i+1)*(self.options['ndims']+1))) for i in xrange(wlen)])

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some question about the code and training time. #4

Some question about the code and training time. #4

sevenights commented Jul 4, 2019

sevenights commented Jul 4, 2019

jcyk commented Jul 4, 2019

sevenights commented Jul 4, 2019

Some question about the code and training time. #4

Some question about the code and training time. #4

Comments

sevenights commented Jul 4, 2019

sevenights commented Jul 4, 2019

jcyk commented Jul 4, 2019

sevenights commented Jul 4, 2019