-
-
Notifications
You must be signed in to change notification settings - Fork 15.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I am getting nan and no predictions at all. #5815
Comments
👋 Hello @LightCannon, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution. If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you. If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available. For business inquiries or professional support requests please visit https://ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com. RequirementsPython>=3.6.0 with all requirements.txt installed including PyTorch>=1.7. To get started: $ git clone https://github.com/ultralytics/yolov5
$ cd yolov5
$ pip install -r requirements.txt EnvironmentsYOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
StatusIf this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit. |
@LightCannon this might be a windows/conda/CUDA11 bug that PyTorch has as mentioned in some other issues, in which case downgrading to CUDA 10 would solve this. Or you may have some problems with your dataset labels. Check your mosaic jpgs to ensure your labels are correct and follow the instructions here: |
I have downgraded to CUDA 10.2 and you are right, this is a bug from CUDA 11.3 and everything works now with CUDA 10.2. Thanks for your help. |
@LightCannon |
👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs. Access additional YOLOv5 🚀 resources:
Access additional Ultralytics ⚡ resources:
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed! Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐! |
related, unsolved: https://docs.ultralytics.com/yolov5/tutorials/hyperparameter_evolution1 https://www.codestudyblog.com/cs2112pyc/1230044131.html
|
https://stackoverflow.com/questions/31326015/how-to-verify-cudnn-installation looks like CUDNN is missing on my system. |
wow, now some hours of driver re-install, --- long way to go.... |
solved. https://pytorch.org/get-started/previous-versions/
my system is now filled up with unwanted packages from all nvidia-experiments. maybe it only works of mismatch of cuda-packages 10 & 11 ? i have to cross-check this with a fresh install of ubuntu , |
@ozett thanks for the feedback! Good to know CUDA 11.6 with driver 510.39.01 and torch==1.7.1+cu101 work well with consumer cards. |
thanks for the encouragement. this case must be special with Geforce 16xx cards. i have to cross-check the next days on a fresh ubuntu system 20.x edit: also i want the trained model to run on another install, that will take some time ... and i will report here the results briefly.. |
TESTED with fresh install: that worked and fixed the nan-nan error. detailed testrun:
|
@ozett thank you. I'm using YOLOv8 and had the same problem. Your comments saved me from excessive head scratching. Environment: GeForce GTX 1650, Windows 11 64-bit, driver 528.02, python 3.9 Version that works for me: torch==1.9.0+cu102 Some other versions that I tried:
Also, because of how Python works in Windows, I had to reduce the number of workers to 1 in order to maximize GPU utilization. Computer vision is tough. |
Search before asking
Question
Hello Everyone, I am new to yoloV5 and I I have problem cannot figure its cause.
I am training with custom dataset (I am trying using low epochs first), but what I am getting is that box and obj are nan. Also, the no detections appear on validation images.
I have used this command to train:
python train.py --img 412 --batch 2 --epochs 2 --data people.yaml --cfg models\yolov5s.yaml --name pm1 --workers 6
There is an issue here also discussing same problem. However, the comments are towards the environment problems which I cannot still figure what is the problem. Here is my environment:
and I am working on this dataset: https://github.com/ucuapps/top-view-multi-person-tracking
I appreciate any help regarding fixing this problem and getting it work well. Thanks
Additional
No response
The text was updated successfully, but these errors were encountered: