Initial Loss problem When Fine-Tuning TinyChart-3B-768 with TinyChartData #92

buckerF · 2024-07-01T06:07:26Z

I encountered an issue while fine-tuning the TinyChart-3B-768 model using the TinyChartData dataset. The initial loss is unexpectedly high, reaching 7.6. Additionally, when using the original DeepSpeed script zero3_offload_decay.json, the loss remains constant at 0 throughout the training process.
I changed the version of deepspeed based on pyproject.toml, based on the llava-v1.5 environment; ran vit_add_tome.py against TinyChart-3B-768-siglip.

Are there any dependencies or configurations that I might be missing which could cause the initial loss to be so high?
Why does the loss remain at 0 when using the original DeepSpeed script?

I would appreciate any guidance or insights into potential dependency issues or misconfigurations that could lead to these problems.

Thank you for your assistance.

The text was updated successfully, but these errors were encountered:

zhangliang-04 · 2024-07-08T02:31:18Z

Hi @buckerF,
We also encounter the zero-loss issue as you mentioned sometimes. It is very likely that there is NaN in the forward/backward pass due to fp16 precision. You can try to change bf16 or fp32 instead.

We do not encounter the high-loss issue you mentioned. Since Tinychart has been trained on the data, the loss is intended to be low. Do you make sure that the pre-trained parameters are correctly loaded?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial Loss problem When Fine-Tuning TinyChart-3B-768 with TinyChartData #92

Initial Loss problem When Fine-Tuning TinyChart-3B-768 with TinyChartData #92

buckerF commented Jul 1, 2024

zhangliang-04 commented Jul 8, 2024

Initial Loss problem When Fine-Tuning TinyChart-3B-768 with TinyChartData #92

Initial Loss problem When Fine-Tuning TinyChart-3B-768 with TinyChartData #92

Comments

buckerF commented Jul 1, 2024

zhangliang-04 commented Jul 8, 2024