Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch) #5318

Open
lsj1111 opened this issue Jul 1, 2024 · 4 comments
Open

Comments

@lsj1111
Copy link

lsj1111 commented Jul 1, 2024

the environment is :


sys.platform linux
Python 3.8.19 (default, Mar 20 2024, 19:58:24) [GCC 11.2.0]
numpy 1.24.3
detectron2 0.6 @/media/hky/d2bed7b9-228d-4a5b-ae76-f3f34ce12c7b/hky/lsj/code/MASKDINO/detectron2-main/detectron2
Compiler GCC 7.5
CUDA compiler CUDA 11.3
detectron2 arch flags /media/hky/d2bed7b9-228d-4a5b-ae76-f3f34ce12c7b/hky/lsj/code/MASKDINO/detectron2-main/detectron2/_C.cpython-38-x86_64-linux-gnu.so; cannot find cuobjdump
DETECTRON2_ENV_MODULE
PyTorch 1.10.0 @/media/hky/d2bed7b9-228d-4a5b-ae76-f3f34ce12c7b/hky/anaconda3/envs/maskdino/lib/python3.8/site-packages/torch
PyTorch debug build False
torch._C._GLIBCXX_USE_CXX11_ABI False
GPU available Yes
GPU 0,1 NVIDIA GeForce RTX 4090 (arch=8.9)
Driver version 535.183.01
CUDA_HOME :/usr/local/cuda - invalid!
Pillow 10.3.0
torchvision 0.11.0 @/media/hky/d2bed7b9-228d-4a5b-ae76-f3f34ce12c7b/hky/anaconda3/envs/maskdino/lib/python3.8/site-packages/torchvision
torchvision arch flags /media/hky/d2bed7b9-228d-4a5b-ae76-f3f34ce12c7b/hky/anaconda3/envs/maskdino/lib/python3.8/site-packages/torchvision/_C.so; cannot find cuobjdump
fvcore 0.1.5.post20221221
iopath 0.1.9
cv2 4.10.0


This problem arises when I execute train_net.py:

[07/01 22:30:46] d2.engine.train_loop INFO: Starting training from iteration 0
[07/01 22:30:47] d2.engine.train_loop ERROR: Exception during training:
Traceback (most recent call last):
File "/media/hky/d2bed7b9-228d-4a5b-ae76-f3f34ce12c7b/hky/lsj/code/MASKDINO/detectron2-main/detectron2/engine/train_loop.py", line 155, in train
self.run_step()
File "/media/hky/d2bed7b9-228d-4a5b-ae76-f3f34ce12c7b/hky/lsj/code/MASKDINO/detectron2-main/detectron2/engine/defaults.py", line 498, in run_step
self._trainer.run_step()
File "/media/hky/d2bed7b9-228d-4a5b-ae76-f3f34ce12c7b/hky/lsj/code/MASKDINO/detectron2-main/detectron2/engine/train_loop.py", line 495, in run_step
loss_dict = self.model(data)
File "/media/hky/d2bed7b9-228d-4a5b-ae76-f3f34ce12c7b/hky/anaconda3/envs/maskdino/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/media/hky/d2bed7b9-228d-4a5b-ae76-f3f34ce12c7b/hky/lsj/code/MaskDINO-main/maskdino/maskdino.py", line 267, in forward
losses = self.criterion(outputs, targets,mask_dict)
File "/media/hky/d2bed7b9-228d-4a5b-ae76-f3f34ce12c7b/hky/anaconda3/envs/maskdino/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/media/hky/d2bed7b9-228d-4a5b-ae76-f3f34ce12c7b/hky/lsj/code/MaskDINO-main/maskdino/modeling/criterion.py", line 357, in forward
indices = self.matcher(outputs_without_aux, targets)
File "/media/hky/d2bed7b9-228d-4a5b-ae76-f3f34ce12c7b/hky/anaconda3/envs/maskdino/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/media/hky/d2bed7b9-228d-4a5b-ae76-f3f34ce12c7b/hky/anaconda3/envs/maskdino/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/media/hky/d2bed7b9-228d-4a5b-ae76-f3f34ce12c7b/hky/lsj/code/MaskDINO-main/maskdino/modeling/matcher.py", line 220, in forward
return self.memory_efficient_forward(outputs, targets, cost)
File "/media/hky/d2bed7b9-228d-4a5b-ae76-f3f34ce12c7b/hky/anaconda3/envs/maskdino/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/media/hky/d2bed7b9-228d-4a5b-ae76-f3f34ce12c7b/hky/lsj/code/MaskDINO-main/maskdino/modeling/matcher.py", line 169, in memory_efficient_forward
cost_mask = batch_sigmoid_ce_loss_jit(out_mask, tgt_mask)
RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch)

nvrtc compilation failed:

#define NAN __int_as_float(0x7fffffff)
#define POS_INFINITY __int_as_float(0x7f800000)
#define NEG_INFINITY __int_as_float(0xff800000)

template
device T maximum(T a, T b) {
return isnan(a) ? a : (a > b ? a : b);
}

template
device T minimum(T a, T b) {
return isnan(a) ? a : (a < b ? a : b);
}

extern "C" global
void fused_neg_add(float* ttargets_1, float* aten_add) {
{
float v = __ldg(ttargets_1 + (long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x));
aten_add[(long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)] = (0.f - v) + 1.f;
}
}

@github-actions github-actions bot added the needs-more-info More info is needed to complete the issue label Jul 1, 2024
Copy link

github-actions bot commented Jul 1, 2024

You've chosen to report an unexpected problem or bug. Unless you already know the root cause of it, please include details about it by filling the issue template.
The following information is missing: "Instructions To Reproduce the Issue and Full Logs";

@Programmer-RD-AI
Copy link
Contributor

Hi,
This issue seems to root from pytorch it self...
Check: PyTorch Issue #87595, The issue was initially found in 2022 and now an update has been pushed... The following command should help you get the latest version

pip3 install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/test/cu117/torch_test.html
If there are any issues till please feel free to comment :)

@lsj1111
Copy link
Author

lsj1111 commented Jul 3, 2024

Hi, This issue seems to root from pytorch it self... Check: PyTorch Issue #87595, The issue was initially found in 2022 and now an update has been pushed... The following command should help you get the latest version

pip3 install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/test/cu117/torch_test.html
If there are any issues till please feel free to comment :)

year,thank you ,I have solved this problem. In the official documentation of detectron2, it seems that only cuda11.3 is supported, so I used cuda11.3 and caused the above problem, but then I found that cuda11.6 can also use detectron2, so the problem was solved. .

@github-actions github-actions bot removed the needs-more-info More info is needed to complete the issue label Jul 3, 2024
@Programmer-RD-AI
Copy link
Contributor

ah ok great :) 👍🏽

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants