You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, Thanks for your great work! When I fine-tune InternLM-XComposer2(unfreeze the proj and the whole LLM, freeze vit). In order to avoid OOM, I use zero3 and offload the optimizer to CPU(by modifying the https://github.com/InternLM/InternLM-XComposer/blob/main/InternLM-XComposer-2.0/finetune/ds_config_zero2.json#L17 to cpu). I find an error as following. The original ds_config_zero2.json will not raise this. How can I solve it. Thanks for your advice and reply!
Error Message:
Traceback (most recent call last):
File "/data/FinAi_Mapping_Knowledge/qiyiyan/qbw/ChartLLM/InternLM-XComposer/finetune/finetune_smoe.py", line 396, in <module>
train()
File "/data/FinAi_Mapping_Knowledge/qiyiyan/qbw/ChartLLM/InternLM-XComposer/finetune/finetune_smoe.py", line 297, in train
model = transformers.AutoModelForCausalLM.from_pretrained(
File "/data/FinAi_Mapping_Knowledge/qiyiyan/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 558, in from_pretrained
return model_class.from_pretrained(
File "/data/FinAi_Mapping_Knowledge/qiyiyan/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2966, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "/data/FinAi_Mapping_Knowledge/qiyiyan/miniconda3/envs/intern_clean/lib/python3.9/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 506, in wrapper
f(module, *args, **kwargs)
File "/data/FinAi_Mapping_Knowledge/qiyiyan/qbw/cache/huggingface/modules/transformers_modules/internlm-xcomposer2-vl-7b/modeling_internlm_xcomposer2.py", line 67, in __init__
self.vit = build_vision_tower()
File "/data/FinAi_Mapping_Knowledge/qiyiyan/qbw/cache/huggingface/modules/transformers_modules/internlm-xcomposer2-vl-7b/build_mlp.py", line 11, in build_vision_tower
return CLIPVisionTower(vision_tower)
File "/data/FinAi_Mapping_Knowledge/qiyiyan/miniconda3/envs/intern_clean/lib/python3.9/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 506, in wrapper
f(module, *args, **kwargs)
File "/data/FinAi_Mapping_Knowledge/qiyiyan/qbw/cache/huggingface/modules/transformers_modules/internlm-xcomposer2-vl-7b/build_mlp.py", line 59, in __init__
self.resize_pos()
File "/data/FinAi_Mapping_Knowledge/qiyiyan/qbw/cache/huggingface/modules/transformers_modules/internlm-xcomposer2-vl-7b/build_mlp.py", line 85, in resize_pos
pos_tokens = pos_tokens.reshape(-1, orig_size, orig_size,
RuntimeError: cannot reshape tensor of 0 elements into shape [-1, 24, 24, 0] because the unspecified dimension size -1 can be any value and is ambiguous
The text was updated successfully, but these errors were encountered:
Coobiw
changed the title
Offload CPU Error when fine-tuning InternLM-XComposer2
ZERO3 + Offload CPU Error when fine-tuning InternLM-XComposer2
Jul 12, 2024
Hi, Thanks for your great work! When I fine-tune InternLM-XComposer2(unfreeze the proj and the whole LLM, freeze vit). In order to avoid OOM, I use zero3 and offload the optimizer to CPU(by modifying the https://github.com/InternLM/InternLM-XComposer/blob/main/InternLM-XComposer-2.0/finetune/ds_config_zero2.json#L17 to
cpu
). I find an error as following. The originalds_config_zero2.json
will not raise this. How can I solve it. Thanks for your advice and reply!Error Message:
The text was updated successfully, but these errors were encountered: