You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using yolonas to train a model on a dataset with 10 images. validation has 1 image. The model trains for 7 epochs and then the process just halts. No errors are printed and no logs are printed. I cant even kill the process.
But the same script works for larger datasets.
The more information is provided the better. Would be great if you share what batch size you are using and whether you are training in single gpu or DDP. Is it launch from CLI or Jupyter
Pytorch DataLoader does not like when dataloader length is smaller than num_workers. Any chance you have len(dataloader) < num_workers?
That is my only idea at the moment.
馃悰 Describe the bug
I am using yolonas to train a model on a dataset with 10 images. validation has 1 image. The model trains for 7 epochs and then the process just halts. No errors are printed and no logs are printed. I cant even kill the process.
But the same script works for larger datasets.
Versions
super-gradients==3.7.1
torch==2.3.1
torchmetrics==0.8.0
torchvision==0.18.1
The text was updated successfully, but these errors were encountered: