ZERO3 + Offload CPU Error when fine-tuning InternLM-XComposer2 #374

Coobiw · 2024-07-12T19:49:57Z

Hi, Thanks for your great work! When I fine-tune InternLM-XComposer2(unfreeze the proj and the whole LLM, freeze vit). In order to avoid OOM, I use zero3 and offload the optimizer to CPU(by modifying the https://github.com/InternLM/InternLM-XComposer/blob/main/InternLM-XComposer-2.0/finetune/ds_config_zero2.json#L17 to cpu). I find an error as following. The original ds_config_zero2.json will not raise this. How can I solve it. Thanks for your advice and reply!

Error Message:

Traceback (most recent call last):
  File "/data/FinAi_Mapping_Knowledge/qiyiyan/qbw/ChartLLM/InternLM-XComposer/finetune/finetune_smoe.py", line 396, in <module>
    train()
  File "/data/FinAi_Mapping_Knowledge/qiyiyan/qbw/ChartLLM/InternLM-XComposer/finetune/finetune_smoe.py", line 297, in train
    model = transformers.AutoModelForCausalLM.from_pretrained(
  File "/data/FinAi_Mapping_Knowledge/qiyiyan/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 558, in from_pretrained
    return model_class.from_pretrained(
  File "/data/FinAi_Mapping_Knowledge/qiyiyan/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2966, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
  File "/data/FinAi_Mapping_Knowledge/qiyiyan/miniconda3/envs/intern_clean/lib/python3.9/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 506, in wrapper
    f(module, *args, **kwargs)
  File "/data/FinAi_Mapping_Knowledge/qiyiyan/qbw/cache/huggingface/modules/transformers_modules/internlm-xcomposer2-vl-7b/modeling_internlm_xcomposer2.py", line 67, in __init__
    self.vit = build_vision_tower()
  File "/data/FinAi_Mapping_Knowledge/qiyiyan/qbw/cache/huggingface/modules/transformers_modules/internlm-xcomposer2-vl-7b/build_mlp.py", line 11, in build_vision_tower
    return CLIPVisionTower(vision_tower)
  File "/data/FinAi_Mapping_Knowledge/qiyiyan/miniconda3/envs/intern_clean/lib/python3.9/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 506, in wrapper
    f(module, *args, **kwargs)
  File "/data/FinAi_Mapping_Knowledge/qiyiyan/qbw/cache/huggingface/modules/transformers_modules/internlm-xcomposer2-vl-7b/build_mlp.py", line 59, in __init__
    self.resize_pos()
  File "/data/FinAi_Mapping_Knowledge/qiyiyan/qbw/cache/huggingface/modules/transformers_modules/internlm-xcomposer2-vl-7b/build_mlp.py", line 85, in resize_pos
    pos_tokens = pos_tokens.reshape(-1, orig_size, orig_size,
RuntimeError: cannot reshape tensor of 0 elements into shape [-1, 24, 24, 0] because the unspecified dimension size -1 can be any value and is ambiguous

The text was updated successfully, but these errors were encountered:

mm-assistant bot assigned myownskyW7 Jul 12, 2024

Coobiw changed the title ~~Offload CPU Error when fine-tuning InternLM-XComposer2~~ ZERO3 + Offload CPU Error when fine-tuning InternLM-XComposer2 Jul 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZERO3 + Offload CPU Error when fine-tuning InternLM-XComposer2 #374

ZERO3 + Offload CPU Error when fine-tuning InternLM-XComposer2 #374

Coobiw commented Jul 12, 2024 •

edited

Loading

ZERO3 + Offload CPU Error when fine-tuning InternLM-XComposer2 #374

ZERO3 + Offload CPU Error when fine-tuning InternLM-XComposer2 #374

Comments

Coobiw commented Jul 12, 2024 • edited Loading

Coobiw commented Jul 12, 2024 •

edited

Loading