Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors when use custom data to retrain the Vit-transformer #17

Open
superxiaoying opened this issue Feb 22, 2021 · 0 comments
Open

Errors when use custom data to retrain the Vit-transformer #17

superxiaoying opened this issue Feb 22, 2021 · 0 comments

Comments

@superxiaoying
Copy link

superxiaoying commented Feb 22, 2021

When use my custom dataset, which contains 6 classes, so I modified the data_utils.py, and change the 'num_classes = 6' in train.py. But I got these errors:

Training (X / X Steps) (loss=X.X): 0%|| 0/33 [00:00<?, ?it/s]/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [5,0,0] Assertion t >= 0 && t < n_classes failed.
Training (X / X Steps) (loss=X.X): 0%|| 0/33 [00:00<?, ?it/s]
Traceback (most recent call last):
File "train_trash.py", line 335, in
main()
File "train_trash.py", line 331, in main
train(args, model)
File "train_trash.py", line 211, in train
loss.backward()
File "/root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/tensor.py", line 185, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/autograd/init.py", line 127, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle)
Exception raised from createCublasHandle at /opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/ATen/cuda/CublasHandlePool.cpp:8 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x4d (0x7f533ff7077d in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: + 0xcfc185 (0x7f53410d2185 in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #2: at::cuda::getCurrentCUDABlasHandle() + 0xb75 (0x7f53410d3065 in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #3: + 0xcef217 (0x7f53410c5217 in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #4: at::native::(anonymous namespace)::addmm_out_cuda_impl(at::Tensor&, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar, c10::Scalar) + 0xf7e (0x7f534242985e in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #5: at::native::mm_cuda(at::Tensor const&, at::Tensor const&) + 0xb3 (0x7f534242b353 in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #6: + 0xd14ea0 (0x7f53410eaea0 in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #7: + 0x7b1990 (0x7f5372b9b990 in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #8: at::Tensor c10::Dispatcher::call<at::Tensor, at::Tensor const&, at::Tensor const&>(c10::TypedOperatorHandle<at::Tensor (at::Tensor const&, at::Tensor const&)> const&, at::Tensor const&, at::Tensor const&) const + 0xbc (0x7f5373383c7c in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #9: at::mm(at::Tensor const&, at::Tensor const&) + 0x4b (0x7f53732d4b0b in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #10: + 0x2c2be8f (0x7f5375015e8f in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #11: + 0x7b1990 (0x7f5372b9b990 in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #12: at::Tensor c10::Dispatcher::call<at::Tensor, at::Tensor const&, at::Tensor const&>(c10::TypedOperatorHandle<at::Tensor (at::Tensor const&, at::Tensor const&)> const&, at::Tensor const&, at::Tensor const&) const + 0xbc (0x7f5373383c7c in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #13: at::Tensor::mm(at::Tensor const&) const + 0x4b (0x7f537346a10b in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #14: + 0x2a6d094 (0x7f5374e57094 in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #15: torch::autograd::generated::AddmmBackward::apply(std::vector<at::Tensor, std::allocatorat::Tensor >&&) + 0x2d5 (0x7f5374e5d055 in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #16: + 0x30d1017 (0x7f53754bb017 in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #17: torch::autograd::Engine::evaluate_function(std::shared_ptrtorch::autograd::GraphTask&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::shared_ptrtorch::autograd::ReadyQueue const&) + 0x1400 (0x7f53754b6860 in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #18: torch::autograd::Engine::thread_main(std::shared_ptrtorch::autograd::GraphTask const&) + 0x451 (0x7f53754b7401 in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #19: torch::autograd::Engine::thread_init(int, std::shared_ptrtorch::autograd::ReadyQueue const&, bool) + 0x89 (0x7f53754af579 in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #20: torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptrtorch::autograd::ReadyQueue const&, bool) + 0x4a (0x7f53797de13a in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #21: + 0xc819d (0x7f537c30f19d in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/../../../.././libstdc++.so.6)
frame #22: + 0x76db (0x7f53a0e6c6db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #23: clone + 0x3f (0x7f53a01e8a3f in /lib/x86_64-linux-gnu/libc.so.6)

I guess this error is caused by the labels crossing the boundary, but I can't find where to modify it. Could you please help me fix this problem?

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant