Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why the video does not pass through the encoder? #11

Open
Wang-Xiaodong1899 opened this issue May 12, 2022 · 0 comments
Open

Why the video does not pass through the encoder? #11

Wang-Xiaodong1899 opened this issue May 12, 2022 · 0 comments

Comments

@Wang-Xiaodong1899
Copy link

Hi! lucidrains. Thanks for providing a great repo which is convenient to understand the NUWA paper.
I have a question as follows:
In the NUWA paper, we can see that the inputs of the Encoder are caption tokens (caption condition) and the video tokens (3DNA condition). So, in my eye, the video tokens sequence should fully self-attend in the Encoder, right? And then, the outputs condition the Decoder.
The Decoder provided by you is as following.
截屏2022-05-12 上午11 07 12.
It has causal self-attention and text-condition as we expected. But from the definition in paper, the condition contains the text-condition and 3DNA condition, and these two condition the Decoder. Is my opinion right? I am just curious about the condition in the NUWA paper.
The Encoder in your repo is only the Text-Encoder, but the video does not pass through the encoder to condition the Encoder.

Looking forward to your reply! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant