Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove redundant transposes for rope rotation #807

Conversation

ShashankMosaicML
Copy link
Contributor

@ShashankMosaicML ShashankMosaicML commented Dec 16, 2023

Now that we use transformers v4.36.0, we do not need to transpose query and keys back and forth for apply rope rotations to them. This PR fixes this.

We can see that the loss and mfu plots are almost identical
Screenshot 2023-12-19 at 8 10 48 PM

Screenshot 2023-12-19 at 8 11 33 PM

@ShashankMosaicML ShashankMosaicML marked this pull request as ready for review December 16, 2023 07:35
Copy link
Contributor

@vchiley vchiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we gate on HF version? Or is that the min version of HF that we install?

@ShashankMosaicML
Copy link
Contributor Author

Should we gate on HF version? Or is that the min version of HF that we install?

Yes, the minimum version of HF we install now is 4.36 (

'transformers>=4.36,<4.37',
). Hence, we do not need to gate since the 'unsqueeze_dim' argument was added in version 4.35.

@dakinggg
Copy link
Collaborator

@ShashankMosaicML Lets gate on transformers version please. Since this is in the model code, it'd be good to make it compatible with all (or as much as possible) transformers versions

@ShashankMosaicML
Copy link
Contributor Author

@ShashankMosaicML Lets gate on transformers version please. Since this is in the model code, it'd be good to make it compatible with all (or as much as possible) transformers versions

Done.

llmfoundry/models/layers/attention.py Outdated Show resolved Hide resolved
ShashankMosaicML and others added 2 commits December 19, 2023 17:27
Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>
@ShashankMosaicML ShashankMosaicML merged commit 289536b into mosaicml:main Dec 20, 2023
10 checks passed
@ShashankMosaicML ShashankMosaicML deleted the shashank/fix_redundant_transposes_rope branch December 20, 2023 04:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants