Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoParallel] Fix PHI API inplace output code generation. #59133

Merged
merged 11 commits into from
Nov 22, 2023

Conversation

GhostScreaming
Copy link
Contributor

@GhostScreaming GhostScreaming commented Nov 19, 2023

PR types

Bug fixes

PR changes

Others

Description

Pcard-73145

更正了 PHI API 中对Inplace Output的处理。目前没有切分推导规则的算子,会默认将输入reshard成replicated状态,相应的得到的输出也为replicated。对于Inplace Output,可能导致input被修改成不符合预期的replicated状态。例如adamw_,初始化时参数标记为shard(张量并行切分Linear的Weights),执行完第一个iteration后,Weights退化成replicated,无法恢复到先前状态。对于这种情况,我们需要在兜底规则中,重新将Inplace Output reshard成初始的dist_attr。

对于有切分推导规则的Inplace Output,SetKernelDistOutput函数中不能设置Output的分布式属性,因为Input和Output共享同一个dist_tensor,需要在最后设置正确的dist_attr,不需要对Output进行reshard。

示例代码(adamw_,默认切分推导规则):

    auto dist_out_attr_0 = static_cast<phi::distributed::DistTensor*>((std::get<0>(api_output)).impl().get())->dist_attr();

    auto dist_out_0 = SetKernelDistOutput(&std::get<0>(api_output));
    auto dense_out_0 = dist_out_0 ? dist_out_0->unsafe_mutable_value() : nullptr;
    if (!rank_is_in_current_mesh) {
      *dense_out_0 = phi::DenseTensor(
            std::make_shared<phi::Allocation>(nullptr, 0, phi::distributed::GetDefaultPlace()),
            phi::DenseTensorMeta());
    }

    ...

      // 8. DenseTensor Kernel Call
      using kernel_signature = void(*)(const phi::DeviceContext&, const phi::DenseTensor&, const phi::DenseTensor&, const phi::DenseTensor&, const phi::DenseTensor&, const phi::DenseTensor&, const phi::DenseTensor&, const phi::DenseTensor&, const paddle::optional<phi::DenseTensor>&, const paddle::optional<phi::DenseTensor>&, const phi::Scalar&, const phi::Scalar&, const phi::Scalar&, float, float, bool, bool, int64_t, bool, bool, phi::DenseTensor*, phi::DenseTensor*, phi::DenseTensor*, phi::DenseTensor*, phi::DenseTensor*, phi::DenseTensor*);
      auto* kernel_fn = kernel.GetVariadicKernelFn<kernel_signature>();
      (*kernel_fn)(*dev_ctx, *input_param, *input_grad, *input_learning_rate, *input_moment1, *input_moment2, *input_beta1_pow, *input_beta2_pow, input_master_param, input_skip_update, phi::Scalar(beta1), phi::Scalar(beta2), phi::Scalar(epsilon), lr_ratio, coeff, with_decay, lazy_mode, min_row_size_to_use_multithread, multi_precision, use_global_beta_pow, dense_out_0, dense_out_1, dense_out_2, dense_out_3, dense_out_4, dense_out_5);

    }

    // 9. Set Output Dist Attr For Default Impl
    auto current_process_mesh = paddle::holds_alternative<phi::distributed::TensorDistAttr>(spmd_info.first[0]) ?
               paddle::get<0>(spmd_info.first[0]).process_mesh() : paddle::get<1>(spmd_info.first[0]).at(0).process_mesh();
    // dist_out_0 has no dist_attr if this api has no specified spmd_rules.
    SetReplicatedDistAttrForOutput(dist_out_0, current_process_mesh);
    SetReplicatedDistAttrForOutput(dist_out_1, current_process_mesh);
    SetReplicatedDistAttrForOutput(dist_out_2, current_process_mesh);
    SetReplicatedDistAttrForOutput(dist_out_3, current_process_mesh);
    SetReplicatedDistAttrForOutput(dist_out_4, current_process_mesh);
    SetReplicatedDistAttrForOutput(dist_out_5, current_process_mesh);
    // Set correct dist_attr for nplace output:
    // If no_spmd_rules, reshard it to origin dist_attr,
    // Or set correct spmd output dist_attr
    auto& output_0 = std::get<0>(api_output);
    SetInplaceOutputCorrectDistAttr(dev_ctx, output_0, dist_out_attr_0, true);

示例代码(add_,有切分推导规则):

    auto dist_out = SetKernelDistOutput(&api_output, spmd_info.second[0]);
    auto dense_out = dist_out->unsafe_mutable_value();
    if (!rank_is_in_current_mesh) {{
      *dense_out = phi::DenseTensor(
            std::make_shared<phi::Allocation>(nullptr, 0, phi::distributed::GetDefaultPlace()),
            phi::DenseTensorMeta());
    }}

    ...

      // 8. DenseTensor Kernel Call
      using kernel_signature = void(*)(const phi::DeviceContext&, const phi::DenseTensor&, const phi::DenseTensor&, phi::DenseTensor*);
      auto* kernel_fn = kernel.GetVariadicKernelFn<kernel_signature>();
      (*kernel_fn)(*dev_ctx, *input_x, *input_y, dense_out);
    }

    // 9. Set Output Dist Attr For Default Impl
    // API `add` does not need to set DistAttr for output.
    // Set correct dist_attr for nplace output:
    // If no_spmd_rules, reshard it to origin dist_attr,
    // Or set correct spmd output dist_attr
    SetInplaceOutputCorrectDistAttr(dev_ctx, api_output, spmd_info.second[0], false);

Copy link

paddle-bot bot commented Nov 19, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@GhostScreaming GhostScreaming changed the title [AutoParallel] Fix PHI APU inplace output code generation. [AutoParallel] Fix PHI API inplace output code generation. Nov 19, 2023
Comment on lines 785 to 786
if (ReshardIsNeeded(dist_tensor->dist_attr(), dist_attr[i])) {
if (need_reshard) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这两个有啥区别吗,ReshardIsNeeded和need_reshard感觉是差不多的名字

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReshardIsNeeded是输入dist_tensor->dist_attr()和输入dist_attr[i]不一致的时候,需要进行reshard。need_reshard是输入参数,在 PHI API 这一层判断当前 API 有没有 SPMD rules,有的话不需要再对Output进行reshard,因为InferSPMD推导出的Output DistAttr是正确的,执行完kernel得到的Output local tensor也是正确的shape。只需要设置Output的dist_attr即可。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我之后再提个新PR,换个名字

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经修复,thx~

VLOG(6) << "SetInplaceOutputCorrectDistAttr input "
<< tensors[i].name() << " set its dist_attr from "
<< dist_tensor->dist_attr() << " to " << dist_attr[i];
dist_tensor->unsafe_set_dist_attr(dist_attr[i]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inplace的情况下,直接丢掉output的dist_attr,它的结果还能保证正确性吗

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有SPMD rules的API才会走到这个分支,因为inplace output和input共用dist_tensor,前面不能把spmd_info的结果给output,否则reshard input会出错。output dist_attr的设置放在最后了。和SetReplicatedDistAttrForOutput的作用类似。

FeixLiu
FeixLiu previously approved these changes Nov 21, 2023
Copy link
Contributor

@FeixLiu FeixLiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

auto dist_t = std::make_shared<phi::distributed::DistTensor>(phi::DDim(),
dist_attr);
auto dist_t = std::make_shared<phi::distributed::DistTensor>(
phi::DDim(), paddle::get<0>(dist_attr));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个原则上要用PADDLE_GET系列宏,或者用try_catch包裹,建议下个PR再修复一下

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

嗯嗯,我下个PR一起修一下~

@FeixLiu FeixLiu merged commit f24d463 into PaddlePaddle:develop Nov 22, 2023
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants