Support For Phi1 and Phi1.5 #558

dongxiaolong · 2023-09-12T06:48:54Z

⚠️ Please check that this feature request hasn't been suggested before.

I searched previous Ideas in Discussions didn't find any similar feature requests.
I searched previous Issues didn't find any similar feature requests.

🔖 Feature description

Model itself is licensed for non-commercial only, but claimed to be superior on multiple CoT benchmarks.
https://huggingface.co/microsoft/phi-1_5
https://huggingface.co/microsoft/phi-1
https://arxiv.org/pdf/2309.05463.pdf

✔️ Solution

Mainly work might be supporting for MixFormerSequentialForCausalLM and training config for it.

❓ Alternatives

No response

📝 Additional Context

No response

Acknowledgements

My issue title is concise, descriptive, and in title casing.
I have searched the existing issues to make sure this feature has not been requested yet.
I have provided enough information for the maintainers to understand and evaluate this request.

ryanshrott · 2023-09-12T14:28:47Z

Yes, looking forward to support for phi-1.5

adarshxs · 2023-09-12T16:34:04Z

With examples too please!

winglian · 2023-09-13T04:42:05Z

just a few notes. you'll need to use deepspeed to train phi. There are some features in their custom modeling code which isn't supported w/o deepspeed. sample_packing is not supported. use resize_token_embeddings_to_32x: true

adarshxs · 2023-09-13T04:44:09Z

is fp16 supported by phi1.5? I have V100s. For me neither flash-attn works nor bf16. xformers-attention is supported though

ehartford · 2023-09-13T05:40:34Z

I wouldn't expect flash attention to work on non llama models

adarshxs · 2023-09-13T05:56:18Z

https://gist.github.com/jphme/d9e09b9d285e2ec03b21c81a98f8dc37 but in this config file for fine tuning phi 1.5, they did use flash attention

flash_attention: true

it wont let me load the model in fp16 and bf16 doesnt work. (I run into this error using fp16: Attempting to unscale FP16 gradients)

currently clueless lol

NanoCode012 · 2023-09-13T14:55:02Z

It won't be enabled if it's non-llama model.

Could you provide the config for this?

adarshxs · 2023-09-13T17:09:05Z

https://pastebin.com/FChjuRBm

hardware:
Tesla V100 32GB
Docker:
https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch (nvcr.io/nvidia/pytorch:23.02-py3)

adarshxs · 2023-09-14T15:32:57Z

Hey any updates? I still get the error even while running the new config file uploaded as the example phi/phi-ft.yml with the following change:

bf16: false
fp16: true
tf32: false

dongxiaolong · 2023-09-14T15:38:52Z

check out the latest version
the latest version is below,
bf16: true fp16: false tf32: true
I used blow is also worked properly
bf16: true fp16: false tf32: true

adarshxs · 2023-09-14T15:41:17Z

I am on V100 GPUs that do not support bf16. Are there any work arounds?

dongxiaolong · 2023-09-14T15:47:44Z

Maybe you need to show us the error or use the xformer when it support the Phi

adarshxs · 2023-09-14T15:55:39Z

Here's the error log:- https://pastebin.com/T9CgqBXA

dongxiaolong · 2023-09-14T16:11:03Z

maybe try deepspeed zero1 ...

dongxiaolong · 2023-09-14T16:15:25Z

You could reopen a separate issue and more people will be able to help you.

adarshxs · 2023-09-14T17:08:04Z

@dongxiaolong the issue is now sorted. Thanks for the help. I just had to enable deepspeed zero1.json

dongxiaolong added the enhancement New feature or request label Sep 12, 2023

winglian mentioned this issue Sep 14, 2023

Phi examples #569

Merged

winglian closed this as completed in #569 Sep 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support For Phi1 and Phi1.5 #558

Support For Phi1 and Phi1.5 #558

dongxiaolong commented Sep 12, 2023

ryanshrott commented Sep 12, 2023

adarshxs commented Sep 12, 2023

winglian commented Sep 13, 2023

adarshxs commented Sep 13, 2023 •

edited

Loading

ehartford commented Sep 13, 2023

adarshxs commented Sep 13, 2023 •

edited

Loading

NanoCode012 commented Sep 13, 2023

adarshxs commented Sep 13, 2023

adarshxs commented Sep 14, 2023

dongxiaolong commented Sep 14, 2023

adarshxs commented Sep 14, 2023

dongxiaolong commented Sep 14, 2023

adarshxs commented Sep 14, 2023

dongxiaolong commented Sep 14, 2023

dongxiaolong commented Sep 14, 2023

adarshxs commented Sep 14, 2023 •

edited

Loading

Support For Phi1 and Phi1.5 #558

Support For Phi1 and Phi1.5 #558

Comments

dongxiaolong commented Sep 12, 2023

⚠️ Please check that this feature request hasn't been suggested before.

🔖 Feature description

✔️ Solution

❓ Alternatives

📝 Additional Context

Acknowledgements

ryanshrott commented Sep 12, 2023

adarshxs commented Sep 12, 2023

winglian commented Sep 13, 2023

adarshxs commented Sep 13, 2023 • edited Loading

ehartford commented Sep 13, 2023

adarshxs commented Sep 13, 2023 • edited Loading

NanoCode012 commented Sep 13, 2023

adarshxs commented Sep 13, 2023

adarshxs commented Sep 14, 2023

dongxiaolong commented Sep 14, 2023

adarshxs commented Sep 14, 2023

dongxiaolong commented Sep 14, 2023

adarshxs commented Sep 14, 2023

dongxiaolong commented Sep 14, 2023

dongxiaolong commented Sep 14, 2023

adarshxs commented Sep 14, 2023 • edited Loading

adarshxs commented Sep 13, 2023 •

edited

Loading

adarshxs commented Sep 13, 2023 •

edited

Loading

adarshxs commented Sep 14, 2023 •

edited

Loading