Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][HIP] Add AMDGPU reflect pass to choose between safe and unsafe AMDGPU atomics #11467

Merged
merged 14 commits into from
Apr 24, 2024

Conversation

hdelan
Copy link
Contributor

@hdelan hdelan commented Oct 9, 2023

AMDGPU reflect pass is needed to choose between safe and unsafe atomics
at the libclc level. In the long run we will delete this patch as work
is being done to ensure correct lowering of atomic instructions. See
patches:

llvm/llvm-project#85052
llvm/llvm-project#69229

This work is necessary as malloc shared atomics rely on PCIe atomics
which can have patchy and unreliable support. Therefore, we want to be
able to choose at compile time whether we should use safe atomics using
CAS (which PCIe should support), or if we want to rely of the
availability of the newest PCIe atomics, if malloc shared atomics are
desired.

Also changes the implementation of atomic_or, atomic_and so that they
can choose between the safe or unsafe version based on the AMDGPU
reflect value.

@hdelan hdelan requested review from a team as code owners October 9, 2023 09:26
@hdelan hdelan requested a review from jchlanda October 9, 2023 09:26
@hdelan hdelan temporarily deployed to WindowsCILock October 9, 2023 09:30 — with GitHub Actions Inactive
@hdelan hdelan changed the title [SYCL][HIP] Hip use unsafe atomics flag [SYCL][HIP] Add reflect pass to choose between safe and unsafe atomic xor Oct 9, 2023
@hdelan hdelan temporarily deployed to WindowsCILock October 9, 2023 10:16 — with GitHub Actions Inactive
Copy link
Contributor

@ldrumm ldrumm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely happy about introducing another reflect pass, but I appreciate that we'll likely get better perf than unconditionally prefetching.

What happens to __oclc_amdgpu_reflect if this pass isn't run?

llvm/lib/Target/AMDGPU/AMDGPUReflect.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/AMDGPU/AMDGPUReflect.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/AMDGPU/AMDGPUReflect.cpp Outdated Show resolved Hide resolved
@hdelan
Copy link
Contributor Author

hdelan commented Oct 11, 2023

What happens to __oclc_amdgpu_reflect if this pass isn't run?

The func will remain in the module and a linking error will result at some stage. Which is the same behaviour as the LLVM reflect pass

@hdelan hdelan requested a review from ldrumm October 11, 2023 10:06
@hdelan hdelan temporarily deployed to WindowsCILock October 12, 2023 10:48 — with GitHub Actions Inactive
@hdelan hdelan temporarily deployed to WindowsCILock October 12, 2023 11:27 — with GitHub Actions Inactive
@hdelan hdelan closed this Oct 16, 2023
@hdelan hdelan reopened this Mar 27, 2024
@hdelan hdelan force-pushed the hip-use-unsafe-atomics-flag branch from 379d3a6 to e5657aa Compare March 27, 2024 16:44
@hdelan hdelan force-pushed the hip-use-unsafe-atomics-flag branch from e5657aa to b4617ef Compare March 27, 2024 16:49
@hdelan hdelan force-pushed the hip-use-unsafe-atomics-flag branch from b4617ef to 57ae613 Compare March 28, 2024 16:21
@hdelan hdelan force-pushed the hip-use-unsafe-atomics-flag branch 3 times, most recently from 5a8c9ac to 9051df1 Compare March 28, 2024 16:41
@hdelan hdelan force-pushed the hip-use-unsafe-atomics-flag branch 2 times, most recently from 1872497 to bad104b Compare March 28, 2024 17:06
@hdelan hdelan force-pushed the hip-use-unsafe-atomics-flag branch from 0eda761 to 982ea7a Compare April 22, 2024 15:27
@hdelan hdelan force-pushed the hip-use-unsafe-atomics-flag branch from df15e57 to 7f2771b Compare April 22, 2024 15:50
- Change getNumOperands to arg size
- Move size check above the for loop
@hdelan
Copy link
Contributor Author

hdelan commented Apr 22, 2024

@frasercrmck all your comments have been addressed. Thanks for review!

In terms of function vs module pass - I understand that it might be slightly more optimal to make this a module pass, however I also think for comprehensibility this pass should not diverge too much from NVVMReflect, which is a func pass. Let me know if you think that this is OK

Copy link
Contributor

@frasercrmck frasercrmck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hdelan
Copy link
Contributor Author

hdelan commented Apr 23, 2024

Friendly ping @intel/dpcpp-tools-reviewers

Copy link
Contributor

@ldrumm ldrumm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good modulo some minor code nits

llvm/lib/Target/AMDGPU/AMDGPUOclcReflect.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/AMDGPU/AMDGPUOclcReflect.cpp Outdated Show resolved Hide resolved
- Use a vector of CallInsts instead of Instructions.
- Change assert(fasle) to report_fatal_error.
@hdelan
Copy link
Contributor Author

hdelan commented Apr 23, 2024

Thanks @ldrumm changes made

- Use auto
- Use drop_back to remove null byte
- Replace hip_be with hip
Change opt test to use update_test_checks.py
@ldrumm ldrumm merged commit 34135a3 into intel:sycl Apr 24, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants