Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

including <oneapi/dpl/*> headers cause CUDA_ERROR_NOT_INITIALIZED after fork #1631

Open
jonasdelacour opened this issue Jun 21, 2024 · 2 comments

Comments

@jonasdelacour
Copy link

jonasdelacour commented Jun 21, 2024

Including any <oneapi/dpl> header seems to construct a sycl::queue, which must not be done prior to fork() calls. If you do you get CUDA_ERROR_NOT_INITIALIZED when the child process attempts to destroy this queue and underlying CUDA context.

Here's a minimum example to reproduce this error:

#include <oneapi/dpl/algorithm>
#include <unistd.h>

int main(){
    pid_t pid = fork();
    return 0;
}

Compiled with icpx -fsycl

produces the following stack trace from valgrind:

==11688== Process terminating with default action of signal 6 (SIGABRT)
==11688==    at 0x4FD79FC: __pthread_kill_implementation (pthread_kill.c:44)
==11688==    by 0x4FD79FC: __pthread_kill_internal (pthread_kill.c:78)
==11688==    by 0x4FD79FC: pthread_kill@@GLIBC_2.34 (pthread_kill.c:89)
==11688==    by 0x4F83475: raise (raise.c:26)
==11688==    by 0x4F697F2: abort (abort.c:79)
==11688==    by 0x4914B9D: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30)
==11688==    by 0x492020B: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30)
==11688==    by 0x4920276: std::terminate() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30)
==11688==    by 0x4CB25BA: __clang_call_terminate (in /opt/intel/oneapi/compiler/2024.1/lib/libsycl.so.7.1.0)
==11688==    by 0x4CC0E0F: sycl::_V1::detail::queue_impl::~queue_impl() (in /opt/intel/oneapi/compiler/2024.1/lib/libsycl.so.7.1.0)
==11688==    by 0x4025EA: oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>::~device_policy() (in /home/jonaslacour/playground/a.out)
==11688==    by 0x4F86494: __run_exit_handlers (exit.c:113)
==11688==    by 0x4F8660F: exit (exit.c:143)
==11688==    by 0x4F6AD96: (below main) (libc_start_call_main.h:74)

icpx version:
Intel(R) oneAPI DPC++/C++ Compiler 2024.1.2 (2024.1.2.20240508)
codeplay plugin for Nvidia gpus version:
oneapi-for-nvidia-gpus-2024.1.2-cuda-12.0
nvidia-smi output:

NVIDIA-SMI 545.29.06              Driver Version: 545.29.06    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4090        Off | 00000000:17:00.0 Off |                  Off |
|  0%   41C    P8              18W / 450W |    162MiB / 24564MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce RTX 4090        Off | 00000000:4E:00.0 Off |                  Off |
|  0%   33C    P8              10W / 450W |     16MiB / 24564MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

OS:
Ubuntu 22.04.4 LTS

@akukanov
Copy link
Contributor

akukanov commented Jun 23, 2024

Yes, oneDPL has a predefined execution policy (dpcpp_default) that creates a SYCL queue at construction.

As a workaround, if that predefined policy is not used in the program, you can define the ONEDPL_USE_PREDEFINED_POLICIES macro to zero before including any oneDPL header. Some details here: https://oneapi-src.github.io/oneDPL/macros.html#additional-macros

@akukanov
Copy link
Contributor

The issue should be fixed by #1652.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants