Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting thread count per block via NUM_THREADS #11

Open
te42kyfo opened this issue Jun 4, 2024 · 0 comments
Open

Setting thread count per block via NUM_THREADS #11

te42kyfo opened this issue Jun 4, 2024 · 0 comments

Comments

@te42kyfo
Copy link

te42kyfo commented Jun 4, 2024

In /src/common/util.c, there is a function that is being used in other places to determine the number of threads in a thread block size:

int get_cuda_num_threads(void)
{
    const char* num_threads_env = getenv("NUM_THREADS");
    return (num_threads_env == NULL) ? 32 : atoi(num_threads_env);
}

This looks to me that it is a similar setting as OMP_NUM_THREADS. I don't think that it is useful in the same way. A value less than 32 would just lead to an underutilized warp, a value between 32 and 64 would reduce the total number of threads per SM on some GPUs (consumer ones, they cannot track as many thread blocks per SM as the HPC GPUs), and anything above would not change the total number of threads per SM, as the GPU would just put fewer larger blocks on the SM. In addition, if the num_threads is not a divisor of the (GPU dependent) maximum number of threads per SM, the thread count per SM would oscillate.

I suggest choosing a value for this per kernel, or to improve the default value to 128. That always works, independent of the register count and maximizes occupancy on all GPUs.

The setting does actually have a performance impact:

> ./MDBench-VL-NVCC-X86-AVX2-SP
... 
Performance: 303.70 million atom updates per second

>NUM_THREADS=128 ./MDBench-VL-NVCC-X86-AVX2-SP
...
Performance: 393.34 million atom updates per second
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant