Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some question about the code and training time. #4

Open
sevenights opened this issue Jul 4, 2019 · 3 comments
Open

Some question about the code and training time. #4

sevenights opened this issue Jul 4, 2019 · 3 comments

Comments

@sevenights
Copy link

Thanks for you paper and code.

But I'm confused with some code.

In src/model.py

function get_score() -> function inner_one_step()

update_gate = T.exp(T.dot(ugW[:ln+nhiddens,an+ln-nhiddens:an+ln+ln],com)+ugb[an+ln-nhiddens:an+ln+ln]).reshape((len+1,nhiddens))

and in src/tools.py

function get_word()

update_gate = np.exp(np.dot(ugW[:ln+ndims,an+ln-ndims:an+ln+ln],com)+ugb[an+ln-ndims:an+ln+ln]).reshape((len+1,ndims))

I think they should be

update_gate = T.exp(T.dot(ugW[:ln+nhiddens,an+ln-nhiddens:an+ln+ln],com)+ugb[an+ln-nhiddens:an+ln+ln]).reshape((nhiddens, len+1)).transpose()

and

update_gate = np.exp(np.dot(ugW[:ln+ndims,an+ln-ndims:an+ln+ln],com)+ugb[an+ln-ndims:an+ln+ln]).reshape((ndims, len+1)).transpose()

Since the code

np.exp(np.dot(ugW[:ln+ndims,an+ln-ndims:an+ln+ln],com)+ugb[an+ln-ndims:an+ln+ln])

represent a vector

[e1_1, e1_2, ..., e1_ndims, e2_1, e2_2, ...,eln_1, eln_2, ..., eln_ndim]

where ei_j derived from the jth elem of ith character.
So the reshape((nhiddens, len+1)) reshape the vector to

[e1_1, e1_2, ..., e1_ln,
e1_ln+1, e1_ln+2,...e1_ln+ln
...
eln_ndim-ln+1, ..., eln_ndim-1, eln_ndim]

which may supposed to be

[e1_1, e1_2, ..., e1_ndim,
e2_1, e2_2, ..., e2_ndim,
...
eln_1, eln_2, ..., eln_ndim]

In order to see the difference, I revised the code and ran it on CPU. But The memory occupation raised from 3GB(epoch 1) to 30GB(epoch 8). And it took 8000s per epoch.

The original code took 7000s per epoch and 2.6GB.

Did I misunderstand something?

Thanks!

@sevenights
Copy link
Author

I don't know if I have expressed it clearly. And my chinese is much better than English :D

@jcyk
Copy link
Owner

jcyk commented Jul 4, 2019

@sevenights
Hi,

I treat Theano as a legacy in deep learning and am not familiar with it anymore.

I would refer you to a better implementation there. It is with a modern library, i.e. dynet.

@sevenights
Copy link
Author

But in the greedyCWS, there is no need to calculate the update gate for word representation. In the paper

an update gate z (As in Figure 2), which has been shown helpless to the performance but requires uge computational cost according toour empirical study.

So I read the dy_model.py and get the same question.

update_gate = dy.transpose(dy.concatenate_cols([dy.softmax(dy.pickrange(update_logits,i*(wlen+1),(i+1)*(wlen+1))) for i in xrange(self.options['ndims'])]))

which I thought to be:

update_gate = dy.concatenate_cols([dy.softmax(dy.pickrange(update_logits,i*(self.options['ndims']+1),(i+1)*(self.options['ndims']+1))) for i in xrange(wlen)])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants