Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TinyChart-visual encoder位置编码长度不匹配的错误 #86

Open
nth2000 opened this issue Jun 23, 2024 · 2 comments
Open

TinyChart-visual encoder位置编码长度不匹配的错误 #86

nth2000 opened this issue Jun 23, 2024 · 2 comments

Comments

@nth2000
Copy link

nth2000 commented Jun 23, 2024

首先感谢您的精彩工作。

目前我正在基于tinyllava模型利用tinychart数据复现训练流程,但是我发现bczhou/TinyLLaVA-3.1B-SigLIP中的visual encoder的image_size是384,vit_add_tome.py会将config中的image_size改成768。

因此在模型初始化时会基于image_size=768初始化sigLIP的position embedding的长度,但是bczhou/TinyLLaVA-3.1B-SigLIP checkpoint中的position embedding长度是基于image_size = 384的。将这个checkpoint中的参数load进来时导致了我在运行时产生了参数形状不匹配的错误。

能否请问下如何解决这个错误呢?非常感谢!

@nth2000 nth2000 changed the title TinyChar-visual encoder位置编码长度不匹配的错误 TinyChart-visual encoder位置编码长度不匹配的错误 Jun 23, 2024
@zhangliang-04
Copy link
Collaborator

你好,
tinychart采用插值的方式对tinyllava的position embedding进行扩展。你可以试试以下代码:

import math
import torch
import torch.nn.functional as F

def get_abs_pos(abs_pos, tgt_size):
    # abs_pos: L, C
    # tgt_size: M
    # return: M, C
    src_size = int(math.sqrt(abs_pos.size(0)))
    tgt_size = int(math.sqrt(tgt_size))
    dtype = abs_pos.dtype

    if src_size != tgt_size:
        return F.interpolate(
            abs_pos.float().reshape(1, src_size, src_size, -1).permute(0, 3, 1, 2),
            size=(tgt_size, tgt_size),
            mode="bicubic",
            align_corners=False,
        ).permute(0, 2, 3, 1).flatten(0, 2).to(dtype=dtype)
    else:
        return abs_pos

model = torch.load('pytorch_model.bin')
target_resolution = 768
target_length = (target_resolution // 14) ** 2
print(target_length)
model['vision_model.embeddings.position_embedding.weight'] = get_abs_pos(model['vision_model.embeddings.position_embedding.weight'], target_length)
torch.save(model, 'SigLIP-768/pytorch_model.bin')

@yanchuqiao
Copy link

首先感谢您的精彩工作。

目前我正在基于tinyllava模型利用tinychart数据复现训练流程,但是我发现bczhou/TinyLLaVA-3.1B-SigLIP中的visual encoder的image_size是384,vit_add_tome.py会将config中的image_size改成768。

因此在模型初始化时会基于image_size=768初始化sigLIP的position embedding的长度,但是bczhou/TinyLLaVA-3.1B-SigLIP checkpoint中的position embedding长度是基于image_size = 384的。将这个checkpoint中的参数load进来时导致了我在运行时产生了参数形状不匹配的错误。

能否请问下如何解决这个错误呢?非常感谢!

您好,我也遇到了同样的问题, 请问解决方案是什么?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants