在fused_rope算子中增加rotate_half实现方式 #56401

tianhaodongbd · 2023-08-17T10:35:08Z

PR types

Others

PR changes

OPs

Description

Pcard-70459

在fused_rope算子中增加rotate_half实现方式，通过use_neox_rotary_style这样一个变量来控制，true是rotate_every_two实现、false是rotate_half实现，默认值为true

paddle-bot · 2023-08-17T10:35:12Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Xreki · 2023-08-18T07:53:13Z

paddle/phi/kernels/fusion/gpu/fused_rope_utils.h

@@ -102,5 +103,91 @@ __global__ void VectorizedFusedRopeKernel(phi::Array<const T*, 3> ins_data,
  }
 }

+template <typename T, typename MPType, int VecSize = 2>
+__global__ void VectorizedFusedRopeWithRotateHalfKernel(


CUDA上面可以定义__device__函数，__device__函数可以被__global__函数调用

拆分成2个__global__函数也可以，但还是要避免大段的代码拷贝

Xreki · 2023-08-21T12:46:20Z

python/paddle/incubate/nn/functional/fused_rotary_position_embedding.py

@@ -27,6 +29,7 @@ def fused_rotary_position_embedding(q, k=None, v=None, sin=None, cos=None):
        v (optional|Tensor): The input tensor. The data type is bfloat16, float16, float32 or float64. The shape if v must be [batch_size, seq_len, num_heads, head_dim] and head_dim must be a multiple of 2.
        sin (optional|Tensor): The input tensor. The data type is bfloat16, float16, float32 or float64. The shape if sin must be [seq_len, head_dim] or [1, 1, seq_len, head_dim] and head_dim must be a multiple of 2.
        cos (optional|Tensor): The input tensor. The data type is bfloat16, float16, float32 or float64. The shape if cos must be [seq_len, head_dim] or [1, 1, seq_len, head_dim] and head_dim must be a multiple of 2.
+        use_neox_rotary_style(optional|bool): Use "rotate_every_two" when use_neox_rotary_style is True, use "ratate_half" when use_neox_rotary_style is False. Default True.


这么解释含义并不直观，rotate_every_two和rotate_half并不是大家都知道的通用的表意。

Xreki · 2023-08-21T12:49:01Z

test/legacy_test/test_fused_rotary_position_embedding.py

+        # to [1, 1, seq_len, head_dim]
+        perm = [0, 2, 1, 3]
+        sin_tensor = paddle.transpose(x=sin_tensor, perm=perm)
+        cos_tensor = paddle.transpose(x=cos_tensor, perm=perm)


use_neox_rotary_style为True或者False，只有q、k、v的更新逻辑有差异，sin和cos的计算逻辑并没有差异，因此sin和cos的计算逻辑没有必要在if-else两个分支中重复。

Xreki · 2023-08-31T04:45:30Z

paddle/phi/kernels/fusion/gpu/fused_rope_grad_kernel.cu

+  if (position_ids.get_ptr()) {
+    position_ids_data = position_ids->data<int64_t>();
+
+    flag_position_ids = true;


并不需要加这么个flag，L63将position_ids_data初始化为空，Kernel里面可以用指针是否为空判断

Xreki · 2023-08-31T04:48:12Z

paddle/phi/kernels/fusion/gpu/fused_rope_kernel.cu

+  bool flag_position_ids = false;
+  if (position_ids.get_ptr()) {
+    position_ids_data = position_ids->data<int64_t>();
+


需要对position_ids的维度进行检查，且只有在传入了sin、cos的时候才需要用position_ids，且需要修改sin、cos的shape检查逻辑。

也就是说，sin、cos依据position_ids里面的坐标切片访问后，shape才是[1, seq_len, 1, head_dim]，传进来的可能是一个比较大的shape

paddle/phi/kernels/fusion/gpu/fused_rope_utils.h

Xreki · 2023-08-31T04:56:01Z

python/paddle/incubate/nn/functional/fused_rotary_position_embedding.py

@@ -27,6 +35,8 @@ def fused_rotary_position_embedding(q, k=None, v=None, sin=None, cos=None):
        v (optional|Tensor): The input tensor. The data type is bfloat16, float16, float32 or float64. The shape if v must be [batch_size, seq_len, num_heads, head_dim] and head_dim must be a multiple of 2.
        sin (optional|Tensor): The input tensor. The data type is bfloat16, float16, float32 or float64. The shape if sin must be [seq_len, head_dim] or [1, 1, seq_len, head_dim] and head_dim must be a multiple of 2.
        cos (optional|Tensor): The input tensor. The data type is bfloat16, float16, float32 or float64. The shape if cos must be [seq_len, head_dim] or [1, 1, seq_len, head_dim] and head_dim must be a multiple of 2.
+        position_ids (optional|Tensor): The input tensor. The data type is int64. The shape if position_ids must be [batch_size, seq_len].


The shape if -> The shape of，文档里面参数的格式应该是：position_ids (Tensor, optional)

Xreki

LGTM. PR Title建议改成英文

sunzhongkai588 · 2023-09-01T03:32:21Z

python/paddle/incubate/nn/functional/fused_rotary_position_embedding.py

            import paddle
            from paddle.incubate.nn.functional import fused_rotary_position_embedding

-            q = paddle.randn([1, 1, 4, 10], dtype='float16')
-            k = paddle.randn([1, 1, 4, 10], dtype='float16')
-            v = paddle.randn([1, 1, 4, 10], dtype='float16')
-            out_q, out_k, out_v = fused_rotary_position_embedding(q, k, v)
+            # batch_size = 2
+            # seq_len = 8
+            # num_heads = 2
+            # head_dim = 10

-            x = paddle.randn([1, 1, 1, 10], dtype='float16')
-            y = paddle.randn([1, 1, 1, 10], dtype='float16')
+            # q, k, v: [batch_size, seq_len, num_heads, head_dim]
+            q = paddle.randn([2, 8, 2, 10], dtype='float16')
+            k = paddle.randn([2, 8, 2, 10], dtype='float16')
+            v = paddle.randn([2, 8, 2, 10], dtype='float16')
+
+            # sin, cos: [1, seq_len, 1, head_dim]
+            x = paddle.randn([1, 8, 1, 10], dtype='float16')
+            y = paddle.randn([1, 8, 1, 10], dtype='float16')
            sin = paddle.sin(x)
            cos = paddle.cos(y)
-            out_q, out_k, out_v = fused_rotary_position_embedding(q, k, v, sin=sin, cos=cos)
+
+            # position_ids: [batch_size, seq_len]
+            position_ids = paddle.randint(high=8, shape=[2, 8], dtype='int64')
+
+            # out_q, out_k, out_v: [batch_size, seq_len, num_heads, head_dim]
+            out_q, out_k, out_v = fused_rotary_position_embedding(q, k, v, sin=sin, cos=cos, position_ids=position_ids, use_neox_rotary_style=False)
+            print(out_q.shape)
+            # [2, 8, 2, 10]


代码示例请严格按照 Google style 样式，即代码前需要加上>>> 或 ...，若有输出（如print(out_q.shape)) 则要在输出后加上准确的输出结果，参考 API 文档写作说明—代码示例和文档示例代码书写规范。

注意

本代码部分有 randn 这类带有随机性的api，请在代码部分增加 seed，以保证输出结果固定，便于检查。

# required: gpu 本环境是需要GPU环境吗？是的话需要在代码开头增加 doctest 指令： >>> # doctest: +REQUIRES(env:GPU)

Xreki

LGTM

XiaoguangHu01

LGTM

sunzhongkai588

LGTM，我看API好像是公开的，记得补充一下中文文档

vivienfanghuagood · 2023-09-04T03:49:32Z

paddle/phi/kernels/fusion/gpu/fused_rope_utils.h

+  MPType* sin_value = out_sin;
+  MPType* cos_value = out_cos;
+
+  if (flag_sin_cos) {


这个参数的命名似乎不是好理解，reuse_***?

好的，下一个PR修改

vivienfanghuagood · 2023-09-04T03:49:58Z

paddle/phi/api/yaml/fused_ops.yaml

@@ -148,11 +148,11 @@
  optional : cache_kv, pre_caches, rotary_pos_emb, time_step, seq_lengths, src_mask, gather_index

 - op : fused_rotary_position_embedding
-  args : (Tensor q, Tensor k, Tensor v, Tensor sin, Tensor cos)
+  args : (Tensor q, Tensor k, Tensor v, Tensor sin, Tensor cos, Tensor position_ids, bool use_neox_rotary_style = true)


LGTM for add inputs

MARD1NO

LGTM

MARD1NO · 2023-09-04T06:15:25Z

paddle/phi/kernels/fusion/gpu/fused_rope_grad_kernel.cu

@@ -86,21 +89,42 @@ void FusedRopeGradKernel(const Context& dev_ctx,
    sin_cos_data[1] = cos->data<T>();

    flag_sin_cos = true;
+
+    if (position_ids.get_ptr()) {


这里应该可以直接 if (position_ids) 的

好的，下一个pr修改

MARD1NO · 2023-09-04T06:16:11Z

paddle/phi/kernels/fusion/gpu/fused_rope_kernel.cu

-                                   num_inputs,
-                                   div_c);
+  if (use_neox_rotary_style) {
+    VectorizedFusedRopeWithRotateEveryTwoKernel<T, MPType, vec_size>


我觉得kernel名字改成：

VectorizedFusedNeoxRopeKernel 是不是好点

好的，下一个pr修改

* add rotate_half in fused_rope * add position_ids in fused_rope * modified examples about fused_rope * add set_device in examples

add rotate_half in fused_rope

c4c4bf3

Xreki reviewed Aug 21, 2023

View reviewed changes

tianhaodongbd added 2 commits August 24, 2023 11:57

modified the fused_rope op based on the review

8cc3cf5

add position_ids in fused_rope

2459faa

Xreki reviewed Aug 31, 2023

View reviewed changes

tianhaodongbd added 2 commits August 31, 2023 06:17

modified fused_rope according to review

2e8be33

modified examples about fused_rope

b319375

Xreki previously approved these changes Sep 1, 2023

View reviewed changes

sunzhongkai588 reviewed Sep 1, 2023

View reviewed changes

modified examples according to comment

7620f7c

tianhaodongbd dismissed Xreki’s stale review via 7620f7c September 1, 2023 06:37

Xreki previously approved these changes Sep 1, 2023

View reviewed changes

tianhaodongbd dismissed Xreki’s stale review via 22f0c81 September 1, 2023 10:37

add set_device in examples

7a56d1c

tianhaodongbd force-pushed the FusedRope branch from 22f0c81 to 7a56d1c Compare September 1, 2023 11:36

Update fused_rotary_position_embedding.py

8619245

tianshuo78520a approved these changes Sep 4, 2023

View reviewed changes

XiaoguangHu01 approved these changes Sep 4, 2023

View reviewed changes

sunzhongkai588 approved these changes Sep 4, 2023

View reviewed changes

vivienfanghuagood reviewed Sep 4, 2023

View reviewed changes

vivienfanghuagood approved these changes Sep 4, 2023

View reviewed changes

yuanlehome approved these changes Sep 4, 2023

View reviewed changes

MARD1NO approved these changes Sep 4, 2023

View reviewed changes

heavengate approved these changes Sep 4, 2023

View reviewed changes

Xreki merged commit c089a2a into PaddlePaddle:develop Sep 4, 2023
25 of 26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

在fused_rope算子中增加rotate_half实现方式 #56401

在fused_rope算子中增加rotate_half实现方式 #56401

tianhaodongbd commented Aug 17, 2023 •

edited by Xreki

Loading

paddle-bot bot commented Aug 17, 2023

Xreki Aug 18, 2023

tianhaodongbd Aug 24, 2023

Xreki Aug 21, 2023

tianhaodongbd Aug 24, 2023

Xreki Aug 21, 2023

tianhaodongbd Aug 24, 2023

Xreki Aug 31, 2023

tianhaodongbd Aug 31, 2023

Xreki Aug 31, 2023

tianhaodongbd Aug 31, 2023

Xreki Aug 31, 2023

tianhaodongbd Aug 31, 2023

Xreki left a comment •

edited

Loading

sunzhongkai588 Sep 1, 2023

Xreki left a comment

XiaoguangHu01 left a comment

sunzhongkai588 left a comment

vivienfanghuagood Sep 4, 2023

tianhaodongbd Sep 4, 2023

vivienfanghuagood Sep 4, 2023

MARD1NO left a comment

MARD1NO Sep 4, 2023

tianhaodongbd Sep 4, 2023

MARD1NO Sep 4, 2023

tianhaodongbd Sep 4, 2023

在fused_rope算子中增加rotate_half实现方式 #56401

在fused_rope算子中增加rotate_half实现方式 #56401

Conversation

tianhaodongbd commented Aug 17, 2023 • edited by Xreki Loading

PR types

PR changes

Description

paddle-bot bot commented Aug 17, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Xreki left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Xreki left a comment

Choose a reason for hiding this comment

XiaoguangHu01 left a comment

Choose a reason for hiding this comment

sunzhongkai588 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MARD1NO left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tianhaodongbd commented Aug 17, 2023 •

edited by Xreki

Loading

Xreki left a comment •

edited

Loading