Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError when calling jt.unique in cuda #346

Open
Taited opened this issue Jun 28, 2022 · 0 comments
Open

RuntimeError when calling jt.unique in cuda #346

Taited opened this issue Jun 28, 2022 · 0 comments

Comments

@Taited
Copy link

Taited commented Jun 28, 2022

Description

While using jittor in a docker environment on A100 GPU server, there is a runtime error in calling jt.unique

Full Log

[i 0628 12:51:52.026463 96 compiler.py:951] Jittor(1.3.4.4) src: /opt/miniconda/lib/python3.7/site-packages/jittor
[i 0628 12:51:52.030702 96 compiler.py:952] g++ at /usr/bin/g++(7.5.0)
[i 0628 12:51:52.030859 96 compiler.py:953] cache_path: /root/.cache/jittor/jt1.3.4/g++7.5.0/py3.7.9/Linux-5.4.0-81x73/AMDEPYC774264-xaf/default
[i 0628 12:51:52.036395 96 __init__.py:411] Found /usr/local/cuda/bin/nvcc(10.2.89) at /usr/local/cuda/bin/nvcc.
[i 0628 12:51:52.041496 96 __init__.py:411] Found addr2line(2.30) at /usr/bin/addr2line.
[i 0628 12:51:52.467900 96 compiler.py:1006] cuda key:cu10.2.89_sm_80
[i 0628 12:51:52.814636 96 __init__.py:227] Total mem: 1007.70GB, using 16 procs for compiling.
[i 0628 12:51:54.003720 96 jit_compiler.cc:28] Load cc_path: /usr/bin/g++
[i 0628 12:51:56.031813 96 init.cc:62] Found cuda archs: [80,]
[w 0628 12:51:56.228352 96 compiler.py:1356] CUDA arch(80)>75 will be backward-compatible
[i 0628 12:51:57.074075 96 __init__.py:411] Found mpicc(3.1.2) at /usr/local/bin/mpicc.
[i 0628 12:51:57.418543 96 compile_extern.py:30] found /usr/local/cuda/include/cublas.h
[i 0628 12:51:57.441918 96 compile_extern.py:30] found /usr/local/cuda/lib64/libcublas.so
[i 0628 12:51:57.442159 96 compile_extern.py:30] found /usr/local/cuda/lib64/libcublasLt.so.10
[i 0628 12:53:47.091725 96 compile_extern.py:30] found /usr/include/cudnn.h
[i 0628 12:53:47.121692 96 compile_extern.py:30] found /usr/lib/x86_64-linux-gnu/libcudnn.so
[i 0628 12:55:31.574456 96 compile_extern.py:30] found /usr/local/cuda/include/curand.h
[i 0628 12:55:31.641044 96 compile_extern.py:30] found /usr/local/cuda/lib64/libcurand.so
[i 0628 12:55:31.727433 96 compile_extern.py:30] found /usr/local/cuda/include/cufft.h
[i 0628 12:55:31.784981 96 compile_extern.py:30] found /usr/local/cuda/lib64/libcufft.so
[i 0628 12:55:31.889927 96 cuda_flags.cc:32] CUDA enabled.

When calling jt.unique(), it returens:

Exception has occurred: RuntimeError
[38;5;1m[f 0628 12:36:06.324679 20 executor.cc:661] 
Execute fused operator(672/734) failed. 
[JIT Source]: /root/.cache/jittor/jt1.3.4/g++7.5.0/py3.7.9/Linux-5.4.0-81x73/AMDEPYC774264-xaf/default/cu10.2.89_sm_80/jit/cutt_transpose_T_1__JIT_1__JIT_cuda_1__index_t_int32__hash_6fb2cc42cc1e932f_op.cc 
[OP TYPE]: cutt_transpose 
[Input]: float32[2,29,256,512,], 
[Output]: float32[2,256,512,29,], 
[Async Backtrace]: not found, please set env JT_SYNC=1, trace_py_var=3 
[Reason]: cudaFuncSetSharedMemConfig(transposePacked<float, 1>, cudaSharedMemBankSizeFourByte ) in file /root/.cache/jittor/cutt/cutt-1.2/src/calls.h:2, function cuttKernelSetSharedMemConfig
Error message: invalid device function [m

Async error was detected. To locate the async backtrace and get better error report, please rerun your code with two enviroment variables set:
>>> export JT_SYNC=1
>>> export trace_py_var=3
  File "/root/codes/trainer.py", line 121, in run
    all_classes = jt.unique(target_map)

When I set:

>>> export JT_SYNC=1
>>> export trace_py_var=3

The log changed to:

Exception has occurred: RuntimeError
[38;5;1m[f 0628 13:03:34.941187 24 executor.cc:661] 
Execute fused operator(13/14) failed. 
[JIT Source]: /root/.cache/jittor/jt1.3.4/g++7.5.0/py3.7.9/Linux-5.4.0-81x73/AMDEPYC774264-xaf/default/cu10.2.89_sm_80/jit/cub_where_Ti_bool__To_int32__NDIM_1__JIT_1__JIT_cuda_1__index_t_int32__hash_4ac929b461bb89b6_op.cc 
[OP TYPE]: cub_where 
[Input]: bool[393215,], 
[Output]: int32[-393215,], 
[Async Backtrace]: --- 
     /opt/miniconda/lib/python3.7/runpy.py:193 <_run_module_as_main> 
     /opt/miniconda/lib/python3.7/runpy.py:85 <_run_code> 
     /root/.vscode-server/extensions/ms-python.python-2022.8.0/pythonFiles/lib/python/debugpy/__main__.py:45 <<module>> 
     /root/.vscode-server/extensions/ms-python.python-2022.8.0/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py:444 <main> 
     /root/.vscode-server/extensions/ms-python.python-2022.8.0/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py:285 <run_file> 
     /opt/miniconda/lib/python3.7/runpy.py:263 <run_path> 
     /opt/miniconda/lib/python3.7/runpy.py:96 <_run_module_code> 
     /opt/miniconda/lib/python3.7/runpy.py:85 <_run_code> 
     main.py:14 <<module>> 
     /opt/miniconda/lib/python3.7/site-packages/jittor/misc.py:539 <unique> 
     /opt/miniconda/lib/python3.7/site-packages/jittor/contrib.py:183 <getitem> 
[Reason]: �[38;5;1m[f 0628 13:03:34.821809 24 helper_cuda.h:128] CUDA error at /root/.cache/jittor/jt1.3.4/g++7.5.0/py3.7.9/Linux-5.4.0-81x73/AMDEPYC774264-xaf/default/cu10.2.89_sm_80/jit/cub_where_Ti_bool__To_int32__NDIM_1__JIT_1__JIT_cuda_1__index_t_int32__hash_4ac929b461bb89b6_op.cc:68  code=98( cudaErrorInvalidDeviceFunction ) cub::DeviceSelect::Flagged(nullptr, temp_storage_bytes, counting_itr, itr, out_temp, (To*)num_nonzeros, N)�[m�[m
  File "/root/codes/main.py", line 14, in <module>
    jt.unique(x)

Implemented Environments

Docker Image: leoxiao/openmpi3.1.2-cuda10.2-cudnn8-ubuntu18.04:pt1.8.1-lts  

Jittor: jittor==1.3.4.4  

Device: A100 GPUs

Minimal Reproduce

import jittor as jt

jt.flags.use_cuda = 1  

x = jt.randint(0, 10, (2, 3, 256, 256))  

jt.unique(x)  

Expected behavior

I want to find a proper docker image to run jittor in cuda. However, the official docker image runs very slow on A100 GPUs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant