Fix test_fused_dropout_act_bias failure on H100 #47285

Wong4j · 2022-10-24T06:06:45Z

PR types

Bug fixes

PR changes

Others

Describe

test_fused_dropout_act_bias UT can pass on A100 but failed on H100. Adding __launch_bounds__(THREADS_PER_CTA) to FusedDropoutActBiasGrad kernel can solve the bug.
I have already reported this bug internally. I guess the nvcc compiler doesn't allocate resources properly for Hopper. This bug may also occur in H800. Thus, I file this PR.

Compute-sanitizer error message:

========= COMPUTE-SANITIZER
========= Program hit cudaErrorLaunchOutOfResources (error 701) due to "too many resources requested for launch" on CUDA API call to cudaLaunchKernel.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame: [0x454676]
=========                in /lib/x86_64-linux-gnu/libcuda.so.1
=========     Host Frame:cudaLaunchKernel [0x6d4a8]
=========                in /test-hopper/./test.out
=========     Host Frame:cudaError cudaLaunchKernel<char>(char const*, dim3, dim3, void**, unsigned long, CUstream_st*) [0xb777]
=========                in /test-hopper/./test.out
=========     Host Frame:__device_stub__Z23FusedDropoutActBiasGradIdEv15GeluGradFunctorIT_EPKS1_PKhS4_S4_S1_llPS1_S7_(GeluGradFunctor<double>&, double const*, unsigned char const*, double const*, double const*, double, long, long, double*, double*) [0xb5ad]  
=========                in /test-hopper/./test.out
=========     Host Frame:void __wrapper__device_stub_FusedDropoutActBiasGrad<double>(GeluGradFunctor<double>&, double const*&, unsigned char const*&, double const*&, double const*&, double const&, long const&, long const&, double*&, double*&) [0xb64c]        
=========                in /test-hopper/./test.out
=========     Host Frame:void FusedDropoutActBiasGrad<double>(GeluGradFunctor<double>, double const*, unsigned char const*, double const*, double const*, double, long, long, double*, double*) [0xb849]

paddle-bot · 2022-10-24T06:06:49Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

zkh2016

LGTM

* Reduce squeeze2_matmul_fuse_pass, flattent tests time (#47098) * Add missing fp32 config and reduce the testing combination * Reduce trt matmul pass test max examples * Loose TRT fp16 tests tolerance (#47100) * Loose TRT half test tolerance to 1e-3 (#47101) * Loose TRT half test tolerance to 1e-3 (#47106) * Update distributed_strategy.proto (#46531) * Close popen pipe after used (#47053) * Add launch_bounds (#47285) * Fix TRT UT failures (#47488) * Format cherry-picked commits * CudnnNormConvolution is no longer supported on NVIDIA Hopper GPUs (#48203) * Skip tests that use fused_ops on H100 * Add error message to FusedOps on H100 Co-authored-by: Shijie <505749828@qq.com> Co-authored-by: Leo Chen <39020268+leo0519@users.noreply.github.com> Co-authored-by: Tian Zheng <tizheng@nvidia.com>

Add launch_bounds

5bb0758

paddle-bot bot added contributor External developers status: proposed labels Oct 24, 2022

Wong4j added the NVIDIA label Oct 24, 2022

paddle-bot bot removed the status: proposed label Oct 24, 2022

zkh2016 approved these changes Oct 27, 2022

View reviewed changes

zkh2016 merged commit 13181fd into PaddlePaddle:develop Oct 27, 2022

zlsh80826 pushed a commit to zlsh80826/Paddle that referenced this pull request Nov 23, 2022

Add launch_bounds (PaddlePaddle#47285)

812ea0d

zlsh80826 mentioned this pull request Nov 23, 2022

Cherrypick NV fixes to release/2.4 #48263

Merged

Wong4j deleted the fix_fused_dropout_act_bias branch February 14, 2023 05:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix test_fused_dropout_act_bias failure on H100 #47285

Fix test_fused_dropout_act_bias failure on H100 #47285

Wong4j commented Oct 24, 2022

paddle-bot bot commented Oct 24, 2022

zkh2016 left a comment

Fix test_fused_dropout_act_bias failure on H100 #47285

Fix test_fused_dropout_act_bias failure on H100 #47285

Conversation

Wong4j commented Oct 24, 2022

PR types

PR changes

Describe

paddle-bot bot commented Oct 24, 2022

zkh2016 left a comment

Choose a reason for hiding this comment