-
Notifications
You must be signed in to change notification settings - Fork 481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error training yolo_nas_l #1997
Comments
I don't think it has anything to do with a sliding window inference. It just happen to be nearby in the file where the stacktrace is printed. If you look closely to a stacktrace you will see "-> " symbols indicating where there error is coming from. For this I suggest to test whether you can get a single batch or sample from dataset. To simplify the debugging it's better to turn of all workers (workers: 0) when creating a DataLoader. This way you will get the exception in the main thread with better exception message that hopefully should give you a clear picture what is happening. Looking forward seeing this error message. |
Thank you kindly for a brief reply.
|
After some debugging I found out that printing these
Results in 'No data fetched from train_loader' |
UPD: removing worker_init_fn in training dataloader seemed to have started it:
It is strange though, that progress bar of an epoch now consists of 1/1. There are only 4 photos in my dataset (since I was trying to run training), so maybe that's the case. |
💡 Your Question
Hi! I'm stuck with trying to train yolo_nas_l on custom data. I follow several guides and notebooks yet constantly come to one error - "You can use sliding window validation callback, but your model does not support sliding window inference. Please either remove the callback or use the model that supports sliding inference: "Segformer".
Here's the code:
Here's the output:
Please help, you lib looks so promising yet I don't understand what I do wrong.
Versions
PyTorch version: 2.3.0
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Windows 11 Pro
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A
Python version: 3.9.19 (main, May 6 2024, 20:12:36) [MSC v.1916 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.22631-SP0
Is CUDA available: True
CUDA runtime version: 12.1.66
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 2050
Nvidia driver version: 552.22
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Revision=
Versions of relevant libraries:
[pip3] numpy==1.23.0
[pip3] onnx==1.15.0
[pip3] onnx-simplifier==0.4.36
[pip3] onnxruntime==1.15.0
[pip3] onnxsim==0.4.36
[pip3] torch==2.3.0
[pip3] torchaudio==2.3.0
[pip3] torchmetrics==0.8.0
[pip3] torchvision==0.18.0
[conda] blas 1.0 mkl
[conda] mkl 2021.4.0 pypi_0 pypi
[conda] mkl-service 2.4.0 py39h2bbff1b_0
[conda] mkl_fft 1.3.1 py39h277e83a_0
[conda] mkl_random 1.2.2 py39hf11a4ad_0
[conda] numpy 1.23.0 pypi_0 pypi
[conda] numpy-base 1.24.3 py39h005ec55_0
[conda] pytorch 2.3.0 py3.9_cuda12.1_cudnn8_0 pytorch
[conda] pytorch-cuda 12.1 hde6ce7c_5 pytorch
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] torch 2.3.0 pypi_0 pypi
[conda] torchaudio 2.3.0 pypi_0 pypi
[conda] torchmetrics 0.8.0 pypi_0 pypi
[conda] torchvision 0.18.0 pypi_0 pypi
The text was updated successfully, but these errors were encountered: