Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Multi-Stream, Single-Thread in New Executor #35024

Merged
merged 10 commits into from
Aug 26, 2021

Conversation

Aurelius84
Copy link
Contributor

@Aurelius84 Aurelius84 commented Aug 19, 2021

PR types

New features

PR changes

Others

Describe

1. 描述

Support Multi-Stream, Single-Thread in New Executor

For Program or Graph topology:

image

image

  • kernel 执行和Memcpy 在同一个stream中
  • Host 端按拓扑序依次拉起 Kernel
  • D2H / CPU kernel 会导致 Host 端阻塞

In this PR:
image

  • kernel 执行单独一个 stream 流, Memcpy 单独一个 stream 流,借助 event 控制同步

2. 为什么引入h2d/d2h 算子?

本PR 新增了两个细粒度的数据拷贝的算子,目的是为了更加精细化的进行Op的管理和调度。

What's Next?

  • Integrate thread pool to implement multi-thread.

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Copy link
Contributor

@wanghuancoder wanghuancoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__EVENT.html#group__CUDART__EVENT_1g7b317e07ff385d85aa656204b971a042
cuda官方文档中表示,对Event的初始化,使用如下flag对于我们的性能比较好:
cudaEventDisableTiming: Specifies that the created event does not need to record timing data. Events created with this flag specified and the cudaEventBlockingSync flag not specified will provide the best performance when used with cudaStreamWaitEvent() and cudaEventQuery().
是否把CudaEvent中的flag修改一下?现在是cudaEventDefault。

@Aurelius84
Copy link
Contributor Author

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__EVENT.html#group__CUDART__EVENT_1g7b317e07ff385d85aa656204b971a042
cuda官方文档中表示,对Event的初始化,使用如下flag对于我们的性能比较好:
cudaEventDisableTiming: Specifies that the created event does not need to record timing data. Events created with this flag specified and the cudaEventBlockingSync flag not specified will provide the best performance when used with cudaStreamWaitEvent() and cudaEventQuery().
是否把CudaEvent中的flag修改一下?现在是cudaEventDefault。

是的,我这里已经做了处理。代码逻辑在这里:

auto cuda_event = std::make_shared<platform::CudaEvent>(
          platform::get_cuda_flags(false, false, false));

liutiexing
liutiexing previously approved these changes Aug 25, 2021
paddle/fluid/framework/new_executor/interpretercore.cc Outdated Show resolved Hide resolved
paddle/fluid/framework/new_executor/interpretercore.h Outdated Show resolved Hide resolved
paddle/fluid/operators/memcpy_d2h_op.cc Outdated Show resolved Hide resolved
paddle/fluid/operators/memcpy_d2h_op.cc Show resolved Hide resolved
paddle/fluid/operators/memcpy_d2h_op.h Outdated Show resolved Hide resolved
paddle/fluid/operators/memcpy_h2d_op.cc Outdated Show resolved Hide resolved
paddle/fluid/operators/memcpy_h2d_op.h Outdated Show resolved Hide resolved
Copy link
Contributor

@wanghuancoder wanghuancoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@lanxianghit lanxianghit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for new c++ operators

@Aurelius84 Aurelius84 merged commit 678a259 into PaddlePaddle:develop Aug 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants