DOC Troubleshooting for unscaling error with fp16 #1336

BenjaminBossan · 2024-01-09T14:22:54Z

Some users ran into the issue of trying to use a model loaded in float16 with mixed precision, e.g. these issues: #341, #1249. This PR documents a workaround to solve the issue.

I also added tests that demonstrate the issue, as well as the workaround.

Notes

This is not strictly a PEFT issue, but more a general error when using AMP with float16. Still, since PEFT users encounter this sometimes, it is useful to document it.

When we discussed this issue in the past, I think we concluded that it's not as straightforward as PEFT automatically casting the weights to float32, though I cannot remember anymore what the drawbacks were.

In any case, should we ever add an automatic solution for this in PEFT, the added test should fail, which alerts us to the fact that we need to update the documentation.

Some users ran into the issue of trying to use a model loaded in float16 with mixed precision, e.g. these issues: huggingface#341, huggingface#1249. This PR documents a workaround to solve the issue. I also added tests that demonstrate the issue, as well as the workaround. Notes This is not strictly a PEFT issue, but more a general error when using AMP with float16. Still, since PEFT users encounter this sometimes, it is useful to document it. When we discussed this issue in the past, I think we concluded that it's not as straightforward as PEFT automatically casting the weights to float32, though I cannot remember anymore what the drawbacks were. In any case, should we ever add an automatic solution for this in PEFT, the added test should fail, which alerts us to the fact that we need to update the documentation.

HuggingFaceDocBuilderDev · 2024-01-09T14:26:39Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

younesbelkada

Makes sense, thanks for adding these clarifications!

pacman100

Thank you @BenjaminBossan for adding this troubleshoot details regarding mixed precision finetuning with PEFT.

hiyouga · 2024-01-13T15:53:38Z

Sorry I have a question, will training a model loaded in fp16 with PEFT lead to bad convergence? I saw an issue about it: huggingface/transformers#28142

BenjaminBossan · 2024-02-06T12:40:36Z

Yes, generally it is better to use AMP if possible.

hiyouga · 2024-02-06T16:08:10Z

Sorry for disturbing you again, I might not properly describe my question. If I enable AMP and use the above workaround to convert the LoRA weights to fp32 while keeping the other parameters in fp16. Will it also lead to the unstable problem mentioned in huggingface/transformers#28142? I am not willing to convert all of the parameters to fp32 since it might double the GPU memory usage.

BenjaminBossan · 2024-02-07T10:40:50Z

I see what you mean. In that case, it should generally be fine, we only want to avoid having the trainable parameters in fp16, the other parameters should work fine. There is still a tiny chance that having fp16 instead of fp32 on those other parameters could lead to a noticeable loss of performance, which is always a risk when using lower precision. But empirically, AFAIK, it works well.

BenjaminBossan mentioned this pull request Jan 9, 2024

modules_to_save: "ValueError: Attempting to unscale FP16 gradients" #341

Closed

BenjaminBossan requested review from pacman100 and younesbelkada January 9, 2024 14:46

younesbelkada approved these changes Jan 9, 2024

View reviewed changes

pacman100 approved these changes Jan 10, 2024

View reviewed changes

pacman100 merged commit c6b28a2 into huggingface:main Jan 10, 2024
14 checks passed

BenjaminBossan deleted the doc-troubleshooting-error-fp16 branch January 10, 2024 10:32

hiyouga referenced this pull request in hiyouga/LLaMA-Factory May 16, 2024

better dtype handle in loading

d9f190f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC Troubleshooting for unscaling error with fp16 #1336

DOC Troubleshooting for unscaling error with fp16 #1336

BenjaminBossan commented Jan 9, 2024

HuggingFaceDocBuilderDev commented Jan 9, 2024

younesbelkada left a comment

pacman100 left a comment •

edited

Loading

hiyouga commented Jan 13, 2024

BenjaminBossan commented Feb 6, 2024

hiyouga commented Feb 6, 2024

BenjaminBossan commented Feb 7, 2024

DOC Troubleshooting for unscaling error with fp16 #1336

DOC Troubleshooting for unscaling error with fp16 #1336

Conversation

BenjaminBossan commented Jan 9, 2024

Notes

HuggingFaceDocBuilderDev commented Jan 9, 2024

younesbelkada left a comment

Choose a reason for hiding this comment

pacman100 left a comment • edited Loading

Choose a reason for hiding this comment

hiyouga commented Jan 13, 2024

BenjaminBossan commented Feb 6, 2024

hiyouga commented Feb 6, 2024

BenjaminBossan commented Feb 7, 2024

pacman100 left a comment •

edited

Loading