Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update replacing MultiHeadAttention with GroupQueryAttention #19882

Merged

Conversation

kunal-vaishnavi
Copy link
Contributor

Description

This PR updates the replacement of MultiHeadAttention (MHA) with GroupQueryAttention (GQA). It is related to the changes in this PR.

Motivation and Context

The updated replacement of MHA with GQA includes the following fusion changes.

  • Apply sliding window within GQA
  • Fuse the rotary embeddings within GQA
  • Fuse the 3 MatMuls into 1 packed MatMul if possible
  • Fuse the 3 Adds into 1 packed Add if possible

@kunal-vaishnavi kunal-vaishnavi merged commit 4ac98d6 into microsoft:main Mar 13, 2024
94 checks passed
YUNQIUGUO pushed a commit that referenced this pull request Mar 21, 2024
### Description
This PR updates the replacement of MultiHeadAttention (MHA) with
GroupQueryAttention (GQA). It is related to the changes in [this
PR](#18906).

### Motivation and Context
The updated replacement of MHA with GQA includes the following fusion
changes.
- Apply sliding window within GQA
- Fuse the rotary embeddings within GQA
- Fuse the 3 MatMuls into 1 packed MatMul if possible
- Fuse the 3 Adds into 1 packed Add if possible
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants