-
Notifications
You must be signed in to change notification settings - Fork 215
-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sort fails on Lovelace (sm8.9) GPUs #1874
Comments
MWE:
This happens because on my Lovelace GPU, |
I'm also confused by that launch configuration logic, @xaellison could you explain it? If you want to ensure enough blocks are launched, you should generally use the number that's returned by the launch configuration (it returns both a max thread and min block count), instead of computing it yourself based on device attributes. |
@xaellison Bump, got a minute to explain your reasoning with the launch configuration? Lines 864 to 873 in a719eb3
|
hey @maleadt, There are two things going on here:
Hopefully I can take a closer look soon and fix this |
Describe the bug
Both quicksort and bitonic sort fail non-deterministically on a lovelace gpu.
To reproduce
Quick example:
I found this running
] test CUDA
locally, so the test suite identifies specific failures:Testset
"reduced block sizes"
for quicksort.Testset "bitonic sort" (link)
Except for
"reduced block sizes"
, quicksort works. Bitonic sort sometimes worksManifest.toml
Expected behavior
A clear and concise description of what you expected to happen.
Version info
Details on Julia:
Details on CUDA:
Additional context
This is on branch: https://github.com/xaellison/CUDA.jl/tree/ae_support_sm_89
The text was updated successfully, but these errors were encountered: