-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AutoParallel] Optimize Reshard [part2: optimize _reshard_input] #60022
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
@@ -341,7 +341,9 @@ def insert_cast_op(block, idx, tensor, op_role, tensor_type): | |||
type=tensor.type, | |||
lod_level=tensor.lod_level, | |||
) | |||
cast_op = block._insert_op( | |||
|
|||
insert_op = block._insert_op if sync else block._insert_op_without_sync |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里insert_op
和cast_op
使用相同的命名结构,但却完全是两个不同的语义。前者是一个函数,后者是一个OP对象,放在一起很容易让人混淆。建议做更清晰的区分。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
将 insert_op 改成了 insert_operation。cast_op 里面的 op 表示 operator,可以区分二者的区别
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…dlePaddle#60022) * opt reshard parse_op_desc * recover third party * change var name * fix code style
PR types
Performance optimization
PR changes
Others
Description
静态图模式下分析得,_reshard_input 中的 parse_op_desc 里面调用了很多 sync_with_cpp 消耗了较多的时间,耗时图如下所示:
经过分析可得,parse_op_desc 中 Insert 类的操作 如insert_c_concat_op、insert_slice_op、insert_concat_op等操作,每次插入op之前都会和cpp端进行同步,这是没有必要的,我们只需要在op插入完成之后做一次同步即可。经模型验证,该优化不会影响模型精度,模型精度可与优化前对齐。
本地测试优化后模型实际Run之前的耗时,优化幅度达 0.208 倍 (23.889 -> 18.928)。
测试环境:本地四卡 1080Ti机器,PaddleNLP Llama2 7b模型, 静态图模式(hack config: config.num_hidden_layers = 12)
优化前后 cprofiler 结果如下,
<-- !!!
指向的是被优化的项:优化后的耗时可视化如下:
脚本启动命令: