Why the video does not pass through the encoder? #11

Wang-Xiaodong1899 · 2022-05-12T03:16:19Z

Hi! lucidrains. Thanks for providing a great repo which is convenient to understand the NUWA paper.
I have a question as follows:
In the NUWA paper, we can see that the inputs of the Encoder are caption tokens (caption condition) and the video tokens (3DNA condition). So, in my eye, the video tokens sequence should fully self-attend in the Encoder, right? And then, the outputs condition the Decoder.
The Decoder provided by you is as following.
.
It has causal self-attention and text-condition as we expected. But from the definition in paper, the condition contains the text-condition and 3DNA condition, and these two condition the Decoder. Is my opinion right? I am just curious about the condition in the NUWA paper.
The Encoder in your repo is only the Text-Encoder, but the video does not pass through the encoder to condition the Encoder.

Looking forward to your reply! Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why the video does not pass through the encoder? #11

Why the video does not pass through the encoder? #11

Wang-Xiaodong1899 commented May 12, 2022

Why the video does not pass through the encoder? #11

Why the video does not pass through the encoder? #11

Comments

Wang-Xiaodong1899 commented May 12, 2022