-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Auto Parallel] Support MoE expert parallelism in dygraph auto parallel #63904
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
0d71cc8
to
18d8dcb
Compare
python/paddle/nn/clip.py
Outdated
if set(g.process_mesh.process_ids) < set( | ||
clip_input.process_mesh.process_ids | ||
): | ||
clip_input = clip_input._local_value() | ||
else: | ||
clip_input = paddle.distributed.reshard( | ||
clip_input, g.process_mesh, clip_input.placements | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
U can refine reshard() to support reshard clip_input to g.process_mesh
h = self.gate(x) | ||
if self.config.run_ep: | ||
local_val_list = ( | ||
dist.auto_parallel.api.local_tensor_list_from_dtensor( | ||
h, self.config.mesh, 0, [dist.Shard(0)] | ||
) | ||
) | ||
else: | ||
local_val_list = paddle.split(h, num_or_sections=2, axis=0) | ||
expert_out_list = [] | ||
for i, expert in enumerate(self.experts): | ||
local_val = local_val_list[i] | ||
expert_out_list.append(expert(local_val)) | ||
if self.config.run_ep: | ||
out = dist.auto_parallel.api.dtensor_from_local_list( | ||
expert_out_list, self.config.mesh, [dist.Shard(0)], 0 | ||
) | ||
else: | ||
out = paddle.stack(expert_out_list, axis=0) | ||
out = out.reshape((-1, self.config.class_num)) | ||
return out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, we need to polish the api of ep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comments can be fixed in the future.
Sorry to inform you that ad7fd06's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for set_tests_properties(test_semi_auto_parallel_simple_net_ep PROPERTIES LABELS "RUN_TYPE=EXCLUSIVE" TIMEOUT 120)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for ops.yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR Category
Auto Parallel
PR Types
New features
Description
Pcard-76459
Support MoE expert parallelism in dygraph auto parallel. In auto-parallel expert parallelism, experts' weights have different process meshes. This pr implements the expert parallelism as following:
Main changes
local_tensor_list_from_dtensor
anddtensor_from_local_list
to transform the tensors between global and local meshes.1). add a
skip_check_mesh
flag inTensorDistAttr
to skip checking whether the process mesh are different.2). adapt the computing of tensors with local and global process meshes in
grad_clip
.