[CUDA] Attention kernel provider option #21344

tianleiwu · 2024-07-13T01:18:47Z

Description

Add a cuda provider option sdpa_kernel to choose which attention kernel to run for testing purpose.
Allow dump which attention kernel is used per node.
Reserve a flag for cudnn flash attention which will be added soon.

CUDA provider option sdpa_kernel

Instead of setting environment variable, we also support setting it in provider option. Note that the setting is global per session. That could help performance testing of each kernel.

Attention Kernel Debug Info

Set an environment variable ORT_ENABLE_ATTENTION_KERNEL_DEBUG_INFO=1, and ORT will print sdpa kernel used in each node:

For example

ORT_ENABLE_ATTENTION_KERNEL_DEBUG_INFO=1 ./onnxruntime_test_all --gtest_filter=MultiHeadAttentionTest*

It will show debug information of kernel used in testing:

[ RUN      ] MultiHeadAttentionTest.SelfAttention_Batch2_HeadSize32_NoBias_NoMask_PackedQKV
AttentionKernelOptions: FLASH_ATTENTION=0 EFFICIENT_ATTENTION=0 TRT_FUSED_ATTENTION=1 CUDNN_FLASH_ATTENTION=0 TRT_FLASH_ATTENTION=1 TRT_CROSS_ATTENTION=0 TRT_CAUSAL_ATTENTION=0 MATH=1
Operator=MultiHeadAttention Node=node1 DataType=fp16 TRT_FUSED_ATTENTION=1
AttentionKernelOptions: FLASH_ATTENTION=0 EFFICIENT_ATTENTION=1 TRT_FUSED_ATTENTION=0 CUDNN_FLASH_ATTENTION=0 TRT_FLASH_ATTENTION=0 TRT_CROSS_ATTENTION=0 TRT_CAUSAL_ATTENTION=0 MATH=1
Operator=MultiHeadAttention Node=node1 DataType=fp16 EFFICIENT_ATTENTION=1

In this test case, the debug info shows that one session uses trt fused attention and another session use efficient attention.

Motivation and Context

onnxruntime/contrib_ops/cuda/bert/attention_kernel_options.h

@@ -0,0 +1,70 @@
+// Copyright (c) Microsoft Corporation. All rights reserved.


onnxruntime/contrib_ops/cuda/bert/attention_kernel_options.cc

@@ -0,0 +1,149 @@
+// Copyright (c) Microsoft Corporation. All rights reserved.


attention cuda kernel options

8802e40

tianleiwu marked this pull request as draft July 13, 2024 01:18

tianleiwu added 8 commits July 16, 2024 01:03

fix test

065a87f

update ut dependency

a001b73

fix windows build

968f931

add unit test cases

21571d6

Merge branch 'main' into tlwu/attention_kernel_cuda_option

09e9c4d

add back onnxruntime_common dependency

a1c1eec

move option object to cuda provider

8a758a0

reserve a flag for cudnn flash attention; print debug info

81ee535

github-advanced-security bot found potential problems Jul 18, 2024

View reviewed changes

format

04d460c

tianleiwu marked this pull request as ready for review July 18, 2024 01:19

tianleiwu requested review from aciddelgado, yufenglee, wangyems and kunal-vaishnavi July 18, 2024 01:19

tianleiwu added 2 commits July 18, 2024 06:19

exclude hipify

6ad0764

refactoring

0e843e8

kunal-vaishnavi previously approved these changes Jul 18, 2024

View reviewed changes

fix typo; warn that MATH=0 is ignored

2862eb2

tianleiwu dismissed kunal-vaishnavi’s stale review via 2862eb2 July 18, 2024 21:59

tianleiwu requested a review from kunal-vaishnavi July 18, 2024 22:46

wangyems approved these changes Jul 19, 2024

View reviewed changes

kunal-vaishnavi approved these changes Jul 19, 2024

View reviewed changes

tianleiwu merged commit 6ffaaeb into main Jul 19, 2024
90 of 97 checks passed

tianleiwu deleted the tlwu/attention_kernel_cuda_option branch July 19, 2024 20:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDA] Attention kernel provider option #21344

[CUDA] Attention kernel provider option #21344

tianleiwu commented Jul 13, 2024 •

edited

Loading

		@@ -0,0 +1,70 @@
		// Copyright (c) Microsoft Corporation. All rights reserved.

		@@ -0,0 +1,149 @@
		// Copyright (c) Microsoft Corporation. All rights reserved.

[CUDA] Attention kernel provider option #21344

[CUDA] Attention kernel provider option #21344

Conversation

tianleiwu commented Jul 13, 2024 • edited Loading

Description

CUDA provider option sdpa_kernel

Attention Kernel Debug Info

Motivation and Context

tianleiwu commented Jul 13, 2024 •

edited

Loading