Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial Loss problem When Fine-Tuning TinyChart-3B-768 with TinyChartData #92

Open
buckerF opened this issue Jul 1, 2024 · 1 comment

Comments

@buckerF
Copy link

buckerF commented Jul 1, 2024

I encountered an issue while fine-tuning the TinyChart-3B-768 model using the TinyChartData dataset. The initial loss is unexpectedly high, reaching 7.6. Additionally, when using the original DeepSpeed script zero3_offload_decay.json, the loss remains constant at 0 throughout the training process.
I changed the version of deepspeed based on pyproject.toml, based on the llava-v1.5 environment; ran vit_add_tome.py against TinyChart-3B-768-siglip.

Are there any dependencies or configurations that I might be missing which could cause the initial loss to be so high?
Why does the loss remain at 0 when using the original DeepSpeed script?

I would appreciate any guidance or insights into potential dependency issues or misconfigurations that could lead to these problems.

Thank you for your assistance.

@zhangliang-04
Copy link
Collaborator

Hi @buckerF,
We also encounter the zero-loss issue as you mentioned sometimes. It is very likely that there is NaN in the forward/backward pass due to fp16 precision. You can try to change bf16 or fp32 instead.

We do not encounter the high-loss issue you mentioned. Since Tinychart has been trained on the data, the loss is intended to be low. Do you make sure that the pre-trained parameters are correctly loaded?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants