We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
作者您好,我在推理和训练时都遇到了这个问题: File "/PromptCC/models_CC.py", line 264, in forward clip_emb_A, img_feat_A = self.clip_model.encode_image(img_A) ValueError: too many values to unpack (expected 2, got 1) 看起来是因为原始的CLIP输出的是一整张图的特征,而这里的img_feat_A似乎是(N, h*w, 512)的特征 请问您可以分享下是怎么修改CLIP让它输出patch-level的图像特征吗? 感谢!
File "/PromptCC/models_CC.py", line 264, in forward clip_emb_A, img_feat_A = self.clip_model.encode_image(img_A) ValueError: too many values to unpack (expected 2, got 1)
The text was updated successfully, but these errors were encountered:
很抱歉那是由于我们修改了CLIP package的源码,你可以在CLIP.model.VisionTransformer中修改VisionTransformer的forward如下来解决那个bug:
Sorry, something went wrong.
No branches or pull requests
作者您好,我在推理和训练时都遇到了这个问题:
File "/PromptCC/models_CC.py", line 264, in forward clip_emb_A, img_feat_A = self.clip_model.encode_image(img_A) ValueError: too many values to unpack (expected 2, got 1)
看起来是因为原始的CLIP输出的是一整张图的特征,而这里的img_feat_A似乎是(N, h*w, 512)的特征
请问您可以分享下是怎么修改CLIP让它输出patch-level的图像特征吗?
感谢!
The text was updated successfully, but these errors were encountered: