Support Multi-Stream, Single-Thread in New Executor #35024

Aurelius84 · 2021-08-19T09:21:10Z

PR types

New features

PR changes

Others

Describe

1. 描述

Support Multi-Stream, Single-Thread in New Executor

For Program or Graph topology:

kernel 执行和Memcpy 在同一个stream中
Host 端按拓扑序依次拉起 Kernel
D2H / CPU kernel 会导致 Host 端阻塞

In this PR:

kernel 执行单独一个 stream 流， Memcpy 单独一个 stream 流，借助 event 控制同步

2. 为什么引入h2d/d2h 算子？

本PR 新增了两个细粒度的数据拷贝的算子，目的是为了更加精细化的进行Op的管理和调度。

What's Next?

Integrate thread pool to implement multi-thread.

paddle-bot-old · 2021-08-19T09:21:23Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

wanghuancoder

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__EVENT.html#group__CUDART__EVENT_1g7b317e07ff385d85aa656204b971a042
cuda官方文档中表示，对Event的初始化，使用如下flag对于我们的性能比较好：
cudaEventDisableTiming: Specifies that the created event does not need to record timing data. Events created with this flag specified and the cudaEventBlockingSync flag not specified will provide the best performance when used with cudaStreamWaitEvent() and cudaEventQuery().
是否把CudaEvent中的flag修改一下？现在是cudaEventDefault。

Aurelius84 · 2021-08-25T07:32:37Z

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__EVENT.html#group__CUDART__EVENT_1g7b317e07ff385d85aa656204b971a042
cuda官方文档中表示，对Event的初始化，使用如下flag对于我们的性能比较好：
cudaEventDisableTiming: Specifies that the created event does not need to record timing data. Events created with this flag specified and the cudaEventBlockingSync flag not specified will provide the best performance when used with cudaStreamWaitEvent() and cudaEventQuery().
是否把CudaEvent中的flag修改一下？现在是cudaEventDefault。

是的，我这里已经做了处理。代码逻辑在这里：

auto cuda_event = std::make_shared<platform::CudaEvent>(
          platform::get_cuda_flags(false, false, false));

paddle/fluid/framework/new_executor/interpretercore.cc

paddle/fluid/framework/new_executor/interpretercore.h

paddle/fluid/operators/memcpy_d2h_op.cc

paddle/fluid/operators/memcpy_d2h_op.h

paddle/fluid/operators/memcpy_h2d_op.cc

paddle/fluid/operators/memcpy_h2d_op.h

wanghuancoder

LGTM

lanxianghit

LGTM for new c++ operators

Modify into QueueSync QueueAsync

756e302

Aurelius84 requested review from wanghuancoder and phlrain August 19, 2021 09:21

Aurelius84 added 7 commits August 19, 2021 11:21

fix complie on MacOS

851a7e5

Merge remote-tracking branch 'upstream/develop' into event_stream2

662fbd9

fix pointer

5de5273

Merge remote-tracking branch 'upstream/develop' into event_stream2

6375212

fix conflict

1d9dcc9

polish unittest

d31f21c

fix windows fetch error

9251d3d

Aurelius84 requested review from zhhsplendid, TCChenlong and XiaoguangHu01 August 25, 2021 07:04

wanghuancoder reviewed Aug 25, 2021

View reviewed changes

Aurelius84 requested a review from wanghuancoder August 25, 2021 07:33

liutiexing previously approved these changes Aug 25, 2021

View reviewed changes

wanghuancoder reviewed Aug 25, 2021

View reviewed changes

polish code according reviewer

0c2f262

Aurelius84 dismissed liutiexing’s stale review via 0c2f262 August 25, 2021 09:39

Aurelius84 requested a review from wanghuancoder August 25, 2021 09:40

fix device_guard on CPU place

2242336

wanghuancoder approved these changes Aug 25, 2021

View reviewed changes

liutiexing approved these changes Aug 25, 2021

View reviewed changes

TCChenlong approved these changes Aug 26, 2021

View reviewed changes

lanxianghit approved these changes Aug 26, 2021

View reviewed changes

Aurelius84 merged commit 678a259 into PaddlePaddle:develop Aug 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Multi-Stream, Single-Thread in New Executor #35024

Support Multi-Stream, Single-Thread in New Executor #35024

Aurelius84 commented Aug 19, 2021 •

edited

Loading

paddle-bot-old bot commented Aug 19, 2021

wanghuancoder left a comment

Aurelius84 commented Aug 25, 2021

wanghuancoder left a comment

lanxianghit left a comment

Support Multi-Stream, Single-Thread in New Executor #35024

Support Multi-Stream, Single-Thread in New Executor #35024

Conversation

Aurelius84 commented Aug 19, 2021 • edited Loading

PR types

PR changes

Describe

1. 描述

2. 为什么引入h2d/d2h 算子？

What's Next?

paddle-bot-old bot commented Aug 19, 2021

wanghuancoder left a comment

Choose a reason for hiding this comment

Aurelius84 commented Aug 25, 2021

wanghuancoder left a comment

Choose a reason for hiding this comment

lanxianghit left a comment

Choose a reason for hiding this comment

Aurelius84 commented Aug 19, 2021 •

edited

Loading