Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support to extend context with xpos rope #181

Merged
merged 2 commits into from
Jun 10, 2023
Merged

add support to extend context with xpos rope #181

merged 2 commits into from
Jun 10, 2023

Conversation

winglian
Copy link
Collaborator

I'm able to generate a PoC with openllama extended to 4k with this (will release soon)

@@ -127,6 +127,14 @@ def load_model(
# TODO: Check if this would overwrite previous additional_special_tokens
tokenizer.add_special_tokens({"additional_special_tokens": [MEM_TOKEN]})

if cfg.is_llama_derived_model and cfg.xpos_rope:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it should be elif from above. We should maybe not allow multiple patches?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it works with xformers_attention,it's specific to the rotary embedding layer which most of the above doesn't touch

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add this to Readme please?

@winglian winglian merged commit 41e4f6c into main Jun 10, 2023
@winglian winglian deleted the xpos-rope branch June 13, 2023 17:04
mkeoliya pushed a commit to mkeoliya/axolotl that referenced this pull request Dec 15, 2023
…/xpos-rope

add support to extend context with xpos rope
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants