Questions about Enhanced Speaker #9

ZhuFengdaaa · 2019-09-10T03:42:29Z

You claim an enhanced version of Speaker in section 3.4.3. However, geographic information and actions are only used to calculate the weight of features in attention mechanism.

I have difficulty understanding why g,a are not used to directly calculate the context. Could you provide some works related to the motivation of this design?

The text was updated successfully, but these errors were encountered:

airsplay · 2019-10-23T00:52:57Z

Thanks for pointing it out.

I used a trick "fused hidden state" in implementing the attention layer here:

R2R-EnvDrop/r2r_src/model.py

Line 122 in 4c11585

h_tilde = torch.cat((weighted_context, h), 1)

.

Mathematically, it would "add" the information of query into the retrieved context vectors:

c   = Att(query, {key})
out = FC([query, c])

Thus, the information of g, a would be captured by the second LSTM.

I am sorry that I forget to mention it in the paper.

ZhuFengdaaa changed the title ~~Questions about Motivation of Improvements on Speaker~~ Questions about Enhanced Speaker Sep 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about Enhanced Speaker #9

Questions about Enhanced Speaker #9

ZhuFengdaaa commented Sep 10, 2019

airsplay commented Oct 23, 2019

Questions about Enhanced Speaker #9

Questions about Enhanced Speaker #9

Comments

ZhuFengdaaa commented Sep 10, 2019

airsplay commented Oct 23, 2019