-
-
Notifications
You must be signed in to change notification settings - Fork 780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::BFloat16 #844
Comments
I'm able to resolve the issue by casting the model to bf16
but not sure if this is the best way to do it in this codebase |
@griff4692 my guess is the issue is in the deepspeed json configuration during training. |
Hi, tested on 13/12/23, same issue still appears (tested with mistral):
This issue is caused in the linear layer in
Basically, there's a mismatch here with I think the fix needs to be done here: We can add a casting to appropriate dtype here via the model config. Let me know what you think, I can make a PR. |
Closed thanks to @taziksh |
Please check that this issue hasn't been reported before.
Expected Behavior
I fine-tuned Mistral with axolotl using bf16 precision
I want to generate from this fine-tuned model:
/path-to-my-fined-tuned-checkpoint/checkpoint-500
There is a
dtype
mismatch.Current behaviour
Steps to reproduce
Config yaml
Possible solution
I tried
with torch.cuda.amp.autocast()
but that did not workWhich Operating Systems are you using?
Python Version
3.9
axolotl branch-commit
main/f544ab2bed513bef269e6887d35c8aa12a852473
Acknowledgements
The text was updated successfully, but these errors were encountered: