Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC Troubleshooting for unscaling error with fp16 #1336

Merged

Conversation

BenjaminBossan
Copy link
Member

Some users ran into the issue of trying to use a model loaded in float16 with mixed precision, e.g. these issues: #341, #1249. This PR documents a workaround to solve the issue.

I also added tests that demonstrate the issue, as well as the workaround.

Notes

This is not strictly a PEFT issue, but more a general error when using AMP with float16. Still, since PEFT users encounter this sometimes, it is useful to document it.

When we discussed this issue in the past, I think we concluded that it's not as straightforward as PEFT automatically casting the weights to float32, though I cannot remember anymore what the drawbacks were.

In any case, should we ever add an automatic solution for this in PEFT, the added test should fail, which alerts us to the fact that we need to update the documentation.

Some users ran into the issue of trying to use a model loaded in float16
with mixed precision, e.g. these issues: huggingface#341, huggingface#1249. This PR documents
a workaround to solve the issue.

I also added tests that demonstrate the issue, as well as the
workaround.

Notes

This is not strictly a PEFT issue, but more a general error when using
AMP with float16. Still, since PEFT users encounter this sometimes, it
is useful to document it.

When we discussed this issue in the past, I think we concluded that it's
not as straightforward as PEFT automatically casting the weights to
float32, though I cannot remember anymore what the drawbacks were.

In any case, should we ever add an automatic solution for this in PEFT,
the added test should fail, which alerts us to the fact that we need to
update the documentation.
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks for adding these clarifications!

Copy link
Contributor

@pacman100 pacman100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @BenjaminBossan for adding this troubleshoot details regarding mixed precision finetuning with PEFT.

@pacman100 pacman100 merged commit c6b28a2 into huggingface:main Jan 10, 2024
14 checks passed
@BenjaminBossan BenjaminBossan deleted the doc-troubleshooting-error-fp16 branch January 10, 2024 10:32
@hiyouga
Copy link
Contributor

hiyouga commented Jan 13, 2024

Sorry I have a question, will training a model loaded in fp16 with PEFT lead to bad convergence? I saw an issue about it: huggingface/transformers#28142

@BenjaminBossan
Copy link
Member Author

Yes, generally it is better to use AMP if possible.

@hiyouga
Copy link
Contributor

hiyouga commented Feb 6, 2024

Sorry for disturbing you again, I might not properly describe my question. If I enable AMP and use the above workaround to convert the LoRA weights to fp32 while keeping the other parameters in fp16. Will it also lead to the unstable problem mentioned in huggingface/transformers#28142? I am not willing to convert all of the parameters to fp32 since it might double the GPU memory usage.

@BenjaminBossan
Copy link
Member Author

I see what you mean. In that case, it should generally be fine, we only want to avoid having the trainable parameters in fp16, the other parameters should work fine. There is still a tiny chance that having fp16 instead of fp32 on those other parameters could lead to a noticeable loss of performance, which is always a risk when using lower precision. But empirically, AFAIK, it works well.

hiyouga referenced this pull request in hiyouga/LLaMA-Factory May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants