Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZERO3 + Offload CPU Error when fine-tuning InternLM-XComposer2 #374

Open
Coobiw opened this issue Jul 12, 2024 · 0 comments
Open

ZERO3 + Offload CPU Error when fine-tuning InternLM-XComposer2 #374

Coobiw opened this issue Jul 12, 2024 · 0 comments
Assignees

Comments

@Coobiw
Copy link

Coobiw commented Jul 12, 2024

Hi, Thanks for your great work! When I fine-tune InternLM-XComposer2(unfreeze the proj and the whole LLM, freeze vit). In order to avoid OOM, I use zero3 and offload the optimizer to CPU(by modifying the https://github.com/InternLM/InternLM-XComposer/blob/main/InternLM-XComposer-2.0/finetune/ds_config_zero2.json#L17 to cpu). I find an error as following. The original ds_config_zero2.json will not raise this. How can I solve it. Thanks for your advice and reply!

Error Message:

Traceback (most recent call last):
  File "/data/FinAi_Mapping_Knowledge/qiyiyan/qbw/ChartLLM/InternLM-XComposer/finetune/finetune_smoe.py", line 396, in <module>
    train()
  File "/data/FinAi_Mapping_Knowledge/qiyiyan/qbw/ChartLLM/InternLM-XComposer/finetune/finetune_smoe.py", line 297, in train
    model = transformers.AutoModelForCausalLM.from_pretrained(
  File "/data/FinAi_Mapping_Knowledge/qiyiyan/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 558, in from_pretrained
    return model_class.from_pretrained(
  File "/data/FinAi_Mapping_Knowledge/qiyiyan/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2966, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
  File "/data/FinAi_Mapping_Knowledge/qiyiyan/miniconda3/envs/intern_clean/lib/python3.9/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 506, in wrapper
    f(module, *args, **kwargs)
  File "/data/FinAi_Mapping_Knowledge/qiyiyan/qbw/cache/huggingface/modules/transformers_modules/internlm-xcomposer2-vl-7b/modeling_internlm_xcomposer2.py", line 67, in __init__
    self.vit = build_vision_tower()
  File "/data/FinAi_Mapping_Knowledge/qiyiyan/qbw/cache/huggingface/modules/transformers_modules/internlm-xcomposer2-vl-7b/build_mlp.py", line 11, in build_vision_tower
    return CLIPVisionTower(vision_tower)
  File "/data/FinAi_Mapping_Knowledge/qiyiyan/miniconda3/envs/intern_clean/lib/python3.9/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 506, in wrapper
    f(module, *args, **kwargs)
  File "/data/FinAi_Mapping_Knowledge/qiyiyan/qbw/cache/huggingface/modules/transformers_modules/internlm-xcomposer2-vl-7b/build_mlp.py", line 59, in __init__
    self.resize_pos()
  File "/data/FinAi_Mapping_Knowledge/qiyiyan/qbw/cache/huggingface/modules/transformers_modules/internlm-xcomposer2-vl-7b/build_mlp.py", line 85, in resize_pos
    pos_tokens = pos_tokens.reshape(-1, orig_size, orig_size,
RuntimeError: cannot reshape tensor of 0 elements into shape [-1, 24, 24, 0] because the unspecified dimension size -1 can be any value and is ambiguous
@Coobiw Coobiw changed the title Offload CPU Error when fine-tuning InternLM-XComposer2 ZERO3 + Offload CPU Error when fine-tuning InternLM-XComposer2 Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants