-
-
Notifications
You must be signed in to change notification settings - Fork 780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Add Qwen #894
Feat: Add Qwen #894
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm
I was testing this branch with an old version of transformers. In the one we pinned, setting gradient checkpointing would cause an error There are a few ways I see to fix this:
Only in main huggingface/transformers@c13a43a is this fixed |
I added a warning instead and updated the examples. We should remove the warning when we update past 4.35.2 |
Tested working in Colab. Didn't inference, but this should be a good start. |
I tried to run both qlora and lora examples with your branch but I get this error: Traceback (most recent call last): |
Hey @CheshireAI , this error is most likely due to sample packing. The dataset you used is too small or you can turn off sample_packing |
b8ed116
to
a01b3c3
Compare
Do you have a working example? Neither of the examples are working for me. |
For future ref, following discussions on discord, it was due to flash attn incompatible with adapters for Qwen. |
* Feat: Add Qwen * feat: add qwen lora example * feat: update matrix * fix: add trust_remote_code * fix: disable gradient checkpointing * chore: add warning about gradient checkpointing * fix: config * fix: turn off sample packing for this example and reduce seq len * chore: add comment on seq len
Got past the collator issue. Seems to run fine.
Please let me know if there's any other errors. I've set the default token ids to their EOD token. If during inference, there's unintended consequences. We may need to default bos/eos to the ones used in their qwen-chat
Reference: