You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I encountered an issue while fine-tuning the TinyChart-3B-768 model using the TinyChartData dataset. The initial loss is unexpectedly high, reaching 7.6. Additionally, when using the original DeepSpeed script zero3_offload_decay.json, the loss remains constant at 0 throughout the training process.
I changed the version of deepspeed based on pyproject.toml, based on the llava-v1.5 environment; ran vit_add_tome.py against TinyChart-3B-768-siglip.
Are there any dependencies or configurations that I might be missing which could cause the initial loss to be so high?
Why does the loss remain at 0 when using the original DeepSpeed script?
I would appreciate any guidance or insights into potential dependency issues or misconfigurations that could lead to these problems.
Thank you for your assistance.
The text was updated successfully, but these errors were encountered:
Hi @buckerF,
We also encounter the zero-loss issue as you mentioned sometimes. It is very likely that there is NaN in the forward/backward pass due to fp16 precision. You can try to change bf16 or fp32 instead.
We do not encounter the high-loss issue you mentioned. Since Tinychart has been trained on the data, the loss is intended to be low. Do you make sure that the pre-trained parameters are correctly loaded?
I encountered an issue while fine-tuning the TinyChart-3B-768 model using the TinyChartData dataset. The initial loss is unexpectedly high, reaching 7.6. Additionally, when using the original DeepSpeed script
zero3_offload_decay.json
, the loss remains constant at 0 throughout the training process.I changed the version of deepspeed based on
pyproject.toml
, based on the llava-v1.5 environment; ranvit_add_tome.py
againstTinyChart-3B-768-siglip
.Are there any dependencies or configurations that I might be missing which could cause the initial loss to be so high?
Why does the loss remain at 0 when using the original DeepSpeed script?
I would appreciate any guidance or insights into potential dependency issues or misconfigurations that could lead to these problems.
Thank you for your assistance.
The text was updated successfully, but these errors were encountered: