problem on finetuning llama and baichuan with new version transformers #26816

gxy-gxy · 2023-10-15T08:48:58Z

When I tried to finetune llama model with sharegpt dataset, I got these loss curves:

the green loss curve is trained with transformers 4.33.2 version and the orange loss curve is trained with transformers 4.28.1.
obviously, the green one is abnormal and the orange one is correct. I wonder why this happens? The only thing I do is changing the Transformers version. Is this some bugs in transformers or I made something wrong?

gxy-gxy · 2023-10-15T08:50:31Z

I also observed this phenomenon when I tried to fine-tune baichuan model.
here is the loss curve trained with transformers 4.32:

this is the loss curve trained with transformers 4.28:

gxy-gxy · 2023-10-15T08:56:21Z

I finetuned all the models above with the code in FastChat repository on A100-80G.
here is my code:

torchrun --nproc_per_node=8 --master_port=20001 fastchat/train/train_xformers.py  \
    --model_name_or_path llama-7b \
    --data_path fschat.json \
    --bf16 True \
    --output_dir output\
    --num_train_epochs 3 \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 8 \
    --save_strategy "epoch" \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.04 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --fsdp "full_shard auto_wrap" \
    --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
    --model_max_length 4096 \
    --gradient_checkpointing True \
    --lazy_preprocess True \
    --report_to wandb

ArthurZucker · 2023-10-16T08:34:42Z

Hey 🤗 thanks for opening an issue! We try to keep the github issues for bugs/feature requests. We had a similar issue being tracked here #26498 where you can find good tips!

Otherwise could you ask your question on the forum instead? I'm sure the community will be of help!

Thanks!

github-actions · 2023-11-15T08:03:59Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

xhan77 · 2023-11-21T03:42:57Z

I was using transformers 4.33.2 (along with fsdp implemented in pytorch and the accelerate package from HF) and also observed the issue when pretraining llama from scratch: a quickly failing loss when using fsdp+bf16. There's no issue with fsdp+fp32 or ddp+bf16. I upgraded to 4.35.2 and the issue seems to be resolved. I don't know the exact reason behind this though.

Before upgrading transformers, I incorporated many tips from #26498 but they didn't help much in my case.

github-actions · 2023-12-15T08:04:46Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions bot closed this as completed Dec 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

problem on finetuning llama and baichuan with new version transformers #26816

problem on finetuning llama and baichuan with new version transformers #26816

gxy-gxy commented Oct 15, 2023

gxy-gxy commented Oct 15, 2023

gxy-gxy commented Oct 15, 2023

ArthurZucker commented Oct 16, 2023

github-actions bot commented Nov 15, 2023

xhan77 commented Nov 21, 2023 •

edited

Loading

github-actions bot commented Dec 15, 2023

problem on finetuning llama and baichuan with new version transformers #26816

problem on finetuning llama and baichuan with new version transformers #26816

Comments

gxy-gxy commented Oct 15, 2023

gxy-gxy commented Oct 15, 2023

gxy-gxy commented Oct 15, 2023

ArthurZucker commented Oct 16, 2023

github-actions bot commented Nov 15, 2023

xhan77 commented Nov 21, 2023 • edited Loading

github-actions bot commented Dec 15, 2023

xhan77 commented Nov 21, 2023 •

edited

Loading