-
-
Notifications
You must be signed in to change notification settings - Fork 780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support For Phi1 and Phi1.5 #558
Comments
Yes, looking forward to support for phi-1.5 |
With examples too please! |
just a few notes. you'll need to use deepspeed to train phi. There are some features in their custom modeling code which isn't supported w/o deepspeed. |
is fp16 supported by phi1.5? I have V100s. For me neither |
I wouldn't expect flash attention to work on non llama models |
https://gist.github.com/jphme/d9e09b9d285e2ec03b21c81a98f8dc37 but in this config file for fine tuning phi 1.5, they did use flash attention
it wont let me load the model in fp16 and bf16 doesnt work. (I run into this error using fp16: currently clueless lol |
It won't be enabled if it's non-llama model.
Could you provide the config for this? |
hardware: |
Hey any updates? I still get the error even while running the new config file uploaded as the example phi/phi-ft.yml with the following change:
|
check out the latest version |
I am on V100 GPUs that do not support bf16. Are there any work arounds? |
Maybe you need to show us the error or use the xformer when it support the Phi |
Here's the error log:- https://pastebin.com/T9CgqBXA |
maybe try deepspeed zero1 ... |
You could reopen a separate issue and more people will be able to help you. |
@dongxiaolong the issue is now sorted. Thanks for the help. I just had to enable deepspeed zero1.json |
🔖 Feature description
Model itself is licensed for non-commercial only, but claimed to be superior on multiple CoT benchmarks.
https://huggingface.co/microsoft/phi-1_5
https://huggingface.co/microsoft/phi-1
https://arxiv.org/pdf/2309.05463.pdf
✔️ Solution
Mainly work might be supporting for MixFormerSequentialForCausalLM and training config for it.
❓ Alternatives
No response
📝 Additional Context
No response
Acknowledgements
The text was updated successfully, but these errors were encountered: