Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Completes basic dtypes for collective api in eager mode #45574

Merged
merged 8 commits into from
Sep 6, 2022

Conversation

HermitSun
Copy link
Contributor

@HermitSun HermitSun commented Aug 30, 2022

PR types

New features

PR changes

OPs

Describe

This pr completes the basic function of communication framework, support various data types.

  • Supports int8, uint8, bool dtypes for collective ops in NCCL and GLOO (partial support, as shown below) backends.
  • Completes to test all supported dtypes in collective ops in eager mode.

通信框架功能进一步补全,通信操作支持传输丰富的数据类型。

  • 支持动态图场景下使用NCCL、GLOO(部分支持,如下所示)后端进行int8、uint8、bool类型的数据传输。
  • 补充动态图场景下已支持的所有数据类型的集合通信操作的测试。
  • 对应文档的中文api修改,见 https://github.com/PaddlePaddle/docs/pull/5237。

TODO:

  • broadcast (NCCL, GLOO)
  • reduce (NCCL, GLOO)
  • scatter (NCCL, GLOO)
  • alltoall (NCCL)
  • alltoall_single (NCCL)
  • sendrecv (NCCL)
  • isend_irecv (NCCL)
  • reduce_scatter (NCCL)

@paddle-bot
Copy link

paddle-bot bot commented Aug 30, 2022

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@HermitSun HermitSun force-pushed the collective-basic-dtypes branch 2 times, most recently from a28b1e1 to d0105dd Compare August 31, 2022 03:37
@HermitSun HermitSun marked this pull request as ready for review August 31, 2022 05:20
python/paddle/distributed/collective.py Outdated Show resolved Hide resolved
"""
if group is not None and not group.is_member():
return
dst = _get_group_rank(dst, group)
if in_dygraph_mode():
group = _get_default_group() if group is None else group
backend = _group_map_backend[group]
assert backend != 'gloo', ("backend gloo is not supported yet")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

最新的gloo里是支持send,recv和alltoall的,不过我们现在的gloo是野分支,后面升级之后可以一起改。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

新版本gloo实现了alltoall,send和recv未实现,可以在后续一起修改

Copy link
Contributor Author

@HermitSun HermitSun Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gloo似乎也没有实现reduce_scatter,后续可以一起改了

python/paddle/distributed/collective.py Show resolved Hide resolved
python/paddle/distributed/collective.py Show resolved Hide resolved
python/paddle/distributed/collective.py Outdated Show resolved Hide resolved
python/paddle/distributed/collective.py Outdated Show resolved Hide resolved
LiYuRio
LiYuRio previously approved these changes Sep 1, 2022
Copy link
Contributor

@gongweibao gongweibao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@XieYunshen XieYunshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for PROPERTIES TIMEOUT "300" LABELS "RUN_TYPE=DIST"

后续pr会降低单测执行时间

Copy link
Contributor

@XiaoguangHu01 XiaoguangHu01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gongweibao gongweibao merged commit 7a92e74 into PaddlePaddle:develop Sep 6, 2022
@HermitSun HermitSun deleted the collective-basic-dtypes branch September 6, 2022 02:49
Caozhou1995 pushed a commit to Caozhou1995/Paddle that referenced this pull request Sep 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants