Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support For Phi1 and Phi1.5 #558

Closed
5 tasks done
dongxiaolong opened this issue Sep 12, 2023 · 16 comments · Fixed by #569
Closed
5 tasks done

Support For Phi1 and Phi1.5 #558

dongxiaolong opened this issue Sep 12, 2023 · 16 comments · Fixed by #569
Labels
enhancement New feature or request

Comments

@dongxiaolong
Copy link
Contributor

⚠️ Please check that this feature request hasn't been suggested before.

  • I searched previous Ideas in Discussions didn't find any similar feature requests.
  • I searched previous Issues didn't find any similar feature requests.

🔖 Feature description

Model itself is licensed for non-commercial only, but claimed to be superior on multiple CoT benchmarks.
https://huggingface.co/microsoft/phi-1_5
https://huggingface.co/microsoft/phi-1
https://arxiv.org/pdf/2309.05463.pdf

✔️ Solution

Mainly work might be supporting for MixFormerSequentialForCausalLM and training config for it.

❓ Alternatives

No response

📝 Additional Context

No response

Acknowledgements

  • My issue title is concise, descriptive, and in title casing.
  • I have searched the existing issues to make sure this feature has not been requested yet.
  • I have provided enough information for the maintainers to understand and evaluate this request.
@dongxiaolong dongxiaolong added the enhancement New feature or request label Sep 12, 2023
@ryanshrott
Copy link

Yes, looking forward to support for phi-1.5

@adarshxs
Copy link
Contributor

With examples too please!

@winglian
Copy link
Collaborator

just a few notes. you'll need to use deepspeed to train phi. There are some features in their custom modeling code which isn't supported w/o deepspeed. sample_packing is not supported. use resize_token_embeddings_to_32x: true

@adarshxs
Copy link
Contributor

adarshxs commented Sep 13, 2023

is fp16 supported by phi1.5? I have V100s. For me neither flash-attn works nor bf16. xformers-attention is supported though

@ehartford
Copy link
Collaborator

I wouldn't expect flash attention to work on non llama models

@adarshxs
Copy link
Contributor

adarshxs commented Sep 13, 2023

I wouldn't expect flash attention to work on non llama models

https://gist.github.com/jphme/d9e09b9d285e2ec03b21c81a98f8dc37 but in this config file for fine tuning phi 1.5, they did use flash attention

flash_attention: true

it wont let me load the model in fp16 and bf16 doesnt work. (I run into this error using fp16: Attempting to unscale FP16 gradients)

currently clueless lol

@NanoCode012
Copy link
Collaborator

in this config file for fine tuning phi 1.5, they did use flash attention

It won't be enabled if it's non-llama model.

it wont let me load the model in fp16 and bf16 doesnt work. (I run into this error using fp16: Attempting to unscale FP16 gradients)

Could you provide the config for this?

@adarshxs
Copy link
Contributor

in this config file for fine tuning phi 1.5, they did use flash attention

It won't be enabled if it's non-llama model.

it wont let me load the model in fp16 and bf16 doesnt work. (I run into this error using fp16: Attempting to unscale FP16 gradients)

Could you provide the config for this?

https://pastebin.com/FChjuRBm

hardware:
Tesla V100 32GB
Docker:
https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch (nvcr.io/nvidia/pytorch:23.02-py3)

@adarshxs
Copy link
Contributor

in this config file for fine tuning phi 1.5, they did use flash attention

It won't be enabled if it's non-llama model.

it wont let me load the model in fp16 and bf16 doesnt work. (I run into this error using fp16: Attempting to unscale FP16 gradients)

Could you provide the config for this?

Hey any updates? I still get the error even while running the new config file uploaded as the example phi/phi-ft.yml with the following change:

bf16: false
fp16: true
tf32: false

@dongxiaolong
Copy link
Contributor Author

in this config file for fine tuning phi 1.5, they did use flash attention

It won't be enabled if it's non-llama model.

it wont let me load the model in fp16 and bf16 doesnt work. (I run into this error using fp16: Attempting to unscale FP16 gradients)

Could you provide the config for this?

Hey any updates? I still get the error even while running the new config file uploaded as the example phi/phi-ft.yml with the following change:

bf16: false
fp16: true
tf32: false

check out the latest version
the latest version is below,
bf16: true fp16: false tf32: true
I used blow is also worked properly
bf16: true fp16: false tf32: true

@adarshxs
Copy link
Contributor

in this config file for fine tuning phi 1.5, they did use flash attention

It won't be enabled if it's non-llama model.

it wont let me load the model in fp16 and bf16 doesnt work. (I run into this error using fp16: Attempting to unscale FP16 gradients)

Could you provide the config for this?

Hey any updates? I still get the error even while running the new config file uploaded as the example phi/phi-ft.yml with the following change:

bf16: false
fp16: true
tf32: false

check out the latest version the latest version is below, bf16: true fp16: false tf32: true I used blow is also worked properly bf16: true fp16: false tf32: true

I am on V100 GPUs that do not support bf16. Are there any work arounds?

@dongxiaolong
Copy link
Contributor Author

in this config file for fine tuning phi 1.5, they did use flash attention

It won't be enabled if it's non-llama model.

it wont let me load the model in fp16 and bf16 doesnt work. (I run into this error using fp16: Attempting to unscale FP16 gradients)

Could you provide the config for this?

Hey any updates? I still get the error even while running the new config file uploaded as the example phi/phi-ft.yml with the following change:

bf16: false
fp16: true
tf32: false

check out the latest version the latest version is below, bf16: true fp16: false tf32: true I used blow is also worked properly bf16: true fp16: false tf32: true

I am on V100 GPUs that do not support bf16. Are there any work arounds?

Maybe you need to show us the error or use the xformer when it support the Phi

@adarshxs
Copy link
Contributor

in this config file for fine tuning phi 1.5, they did use flash attention

It won't be enabled if it's non-llama model.

it wont let me load the model in fp16 and bf16 doesnt work. (I run into this error using fp16: Attempting to unscale FP16 gradients)

Could you provide the config for this?

Hey any updates? I still get the error even while running the new config file uploaded as the example phi/phi-ft.yml with the following change:

bf16: false
fp16: true
tf32: false

check out the latest version the latest version is below, bf16: true fp16: false tf32: true I used blow is also worked properly bf16: true fp16: false tf32: true

I am on V100 GPUs that do not support bf16. Are there any work arounds?

Maybe you need to show us the error or use the xformer when it support the Phi

Here's the error log:- https://pastebin.com/T9CgqBXA

@dongxiaolong
Copy link
Contributor Author

in this config file for fine tuning phi 1.5, they did use flash attention

It won't be enabled if it's non-llama model.

it wont let me load the model in fp16 and bf16 doesnt work. (I run into this error using fp16: Attempting to unscale FP16 gradients)

Could you provide the config for this?

Hey any updates? I still get the error even while running the new config file uploaded as the example phi/phi-ft.yml with the following change:

bf16: false
fp16: true
tf32: false

check out the latest version the latest version is below, bf16: true fp16: false tf32: true I used blow is also worked properly bf16: true fp16: false tf32: true

I am on V100 GPUs that do not support bf16. Are there any work arounds?

Maybe you need to show us the error or use the xformer when it support the Phi

Here's the error log:- https://pastebin.com/T9CgqBXA

maybe try deepspeed zero1 ...

@dongxiaolong
Copy link
Contributor Author

in this config file for fine tuning phi 1.5, they did use flash attention

It won't be enabled if it's non-llama model.

it wont let me load the model in fp16 and bf16 doesnt work. (I run into this error using fp16: Attempting to unscale FP16 gradients)

Could you provide the config for this?

Hey any updates? I still get the error even while running the new config file uploaded as the example phi/phi-ft.yml with the following change:

bf16: false
fp16: true
tf32: false

check out the latest version the latest version is below, bf16: true fp16: false tf32: true I used blow is also worked properly bf16: true fp16: false tf32: true

I am on V100 GPUs that do not support bf16. Are there any work arounds?

Maybe you need to show us the error or use the xformer when it support the Phi

Here's the error log:- https://pastebin.com/T9CgqBXA

You could reopen a separate issue and more people will be able to help you.

@adarshxs
Copy link
Contributor

adarshxs commented Sep 14, 2023

@dongxiaolong the issue is now sorted. Thanks for the help. I just had to enable deepspeed zero1.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants