Packed QKV and Rotary Embedding Support for sm<80 GQA #20012

aciddelgado · 2024-03-21T18:24:09Z

Description

Add support for packed qkv input and rotary embedding with sm<80 using memory efficient attention kernel.

Motivation and Context

Allows lower-end gpus to run gqa with packed qkv input and rotary embedding.

onnxruntime/contrib_ops/cuda/bert/group_query_attention.cc

onnxruntime/contrib_ops/cuda/bert/group_query_attention_impl.cu

onnxruntime/contrib_ops/cuda/bert/group_query_attention.cc

onnxruntime/contrib_ops/cuda/bert/group_query_attention_impl.cu

### Description Add support for packed qkv input and rotary embedding with sm<80 using memory efficient attention kernel. ### Motivation and Context Allows lower-end gpus to run gqa with packed qkv input and rotary embedding.

rotary memory efficient gqa

b14651d

aciddelgado requested review from tianleiwu, yufenglee and kunal-vaishnavi March 21, 2024 18:24

tianleiwu reviewed Mar 21, 2024

View reviewed changes

onnxruntime/contrib_ops/cuda/bert/group_query_attention.cc Outdated Show resolved Hide resolved

tianleiwu reviewed Mar 21, 2024

View reviewed changes

onnxruntime/contrib_ops/cuda/bert/group_query_attention_impl.cu Outdated Show resolved Hide resolved

formatting and naming

4f5a21e

kunal-vaishnavi added the release:1.17.3 label Mar 21, 2024

kunal-vaishnavi reviewed Mar 21, 2024

View reviewed changes

onnxruntime/contrib_ops/cuda/bert/group_query_attention.cc Show resolved Hide resolved

kunal-vaishnavi reviewed Mar 21, 2024

View reviewed changes

onnxruntime/contrib_ops/cuda/bert/group_query_attention_impl.cu Outdated Show resolved Hide resolved

remove unused print

7908046

tianleiwu previously approved these changes Mar 22, 2024

View reviewed changes

lint fix

284030a

aciddelgado dismissed tianleiwu’s stale review via 284030a March 22, 2024 23:41

tianleiwu approved these changes Mar 23, 2024

View reviewed changes

YUNQIUGUO merged commit 4a196d1 into main Mar 23, 2024
95 checks passed

YUNQIUGUO deleted the aciddelgado/fix_rotary_memeff_gqa branch March 23, 2024 21:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Packed QKV and Rotary Embedding Support for sm<80 GQA #20012

Packed QKV and Rotary Embedding Support for sm<80 GQA #20012

aciddelgado commented Mar 21, 2024

Packed QKV and Rotary Embedding Support for sm<80 GQA #20012

Packed QKV and Rotary Embedding Support for sm<80 GQA #20012

Conversation

aciddelgado commented Mar 21, 2024

Description

Motivation and Context