Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

validation with val.py fails with a indexing error #7291

Closed
1 of 2 tasks
lkno0705 opened this issue Apr 5, 2022 · 6 comments · Fixed by #7292
Closed
1 of 2 tasks

validation with val.py fails with a indexing error #7291

lkno0705 opened this issue Apr 5, 2022 · 6 comments · Fixed by #7292
Labels
bug Something isn't working

Comments

@lkno0705
Copy link

lkno0705 commented Apr 5, 2022

Search before asking

  • I have searched the YOLOv5 issues and found no similar bug report.

YOLOv5 Component

Validation

Bug

I currently have the problem that, if I train a yolo model on a custom dataset, everything works as expected, the model is saved, results get plotted and synced to wandb. But as soon as I try to run the validation seperately with val.py it fails with an index error:

YOLOv5m summary: 369 layers, 21190557 parameters, 0 gradients, 49.1 GFLOPs
WARNING: --img-size 1360 must be multiple of max stride 32, updating to 1376
val: Scanning '/home/leon/studienarbeit/EvalFramework/data/datasets/gtsdb/yolo/val/labels.cache' images and labels..
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95:  38%|███▊      | 23/60 [00:0
Traceback (most recent call last):
  File "/home/leon/studienarbeit/EvalFramework/train_yolo.py", line 13, in <module>
    yoloV5.val(dataset=gtsdb_dataset, batch_size=2, weights='yolov5m.pt', img_size=1360)
  File "/home/leon/studienarbeit/EvalFramework/data/models/yolov5/__init__.py", line 25, in val
    yolo_val.run(data=yml, batch_size=batch_size, imgsz=img_size, device=device, weights=weights, task=task)
  File "/home/leon/miniconda3/envs/evalFramework/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/home/leon/studienarbeit/EvalFramework/data/models/yolov5/yolov5_git/val.py", line 240, in run
    confusion_matrix.process_batch(predn, labelsn)
  File "/home/leon/studienarbeit/EvalFramework/data/models/yolov5/yolov5_git/utils/metrics.py", line 156, in process_batch
    self.matrix[detection_classes[m1[j]], gc] += 1  # correct
IndexError: index 74 is out of bounds for axis 0 with size 44

For reference:
the dataset yml:

# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: ../../../datasets/gtsdb  # dataset root dir
train: yolo/train/images/  # train images (relative to 'path') 118287 images
test: yolo/test/images/  # 20288 of 40670 images, submit to https://competitions.codalab.org/competitions/20794
val: yolo/val/images/

# Classes
nc: 43  # number of classes
names: [ 'Geschwindigkeitsbegrenzung 20 km/h', 'Geschwindigkeitsbegrenzung 30 km/h', 'Geschwindigkeitsbegrenzung 50 km/h', 'Geschwindigkeitsbegrenzung 60 km/h', 'Geschwindigkeitsbegrenzung 70 km/h', 'Geschwindigkeitsbegrenzung 80 km/h', 'Geschwindigkeitsbegrenzung aufgehoben', 'Geschwindigkeitsbegrenzung 100 km/h', 'Geschwindigkeitsbegrenzung 120 km/h',  'Überholverbot',  'Überholverbot für LKW', 'Vorfahrt',  'Vorfahrtsstraße', 'Vorfahrt gewähren', 'Stopschild',  'Verbot für Fahrzeuge aller Art', 'Verbot für LKW',  'Verbot der Einfahrt',  'Gefahrstelle',  'Kurve links',  'Kurve rechts',  'Doppelkuve',  'Unebene Fahrbahn', 'Schleudergefahr',  'Einseitig verengte Fahrbahn', 'Baustelle', 'Lichtzeichenanlage',  'Fußgängerüberweg', 'spielende Kinder',  'Radfahrer kreuzen', 'Schnee- und Eisglätte', 'Achtung Wildwechsel', 'Alle Streckenverbote aufgehoben', 'Vorgeschriebene Fahrtrichtung rechts',  'Vorgeschriebene Fahrtrichtung links',  'Vorgeschriebene Fahrtrichtung geradeaus', 'Vorgeschriebene Fahrtrichtung geradeaus oder rechts', 'Vorgeschriebene Fahrtrichtung geradeaus oder links', 'Rechts vorbei', 'links vorbei', 'Kreisverkehr',  'Überholverbot aufgehoben',  'Überholverbot für LKW aufgehoben' ]  # class names

I'm calling val.py in a python file as follows:

yml = f"{self.__location__}/yolov5_git/data/{dataset.dataset_id.lower()}.yaml"
batch_size = 2
weights = 'yolov5m.pt'
img_size=1360
device = 0
task = 'val'
yolo_val.run(data=yml, batch_size=batch_size, imgsz=img_size, device=device, weights=weights, task=task)

The prepared dataset looks as follows:
Screenshot_20220405_125253
Screenshot_20220405_125356
Screenshot_20220405_125433
with the following structure in every label text file:
4 0.6889705882352941 0.5975 0.01764705882352935 0.030000000000000027
if there are multiple objects in the image, then the next object annotation is written to the next line:

23 0.5422794117647058 0.5549999999999999 0.02132352941176474 0.032500000000000084
2 0.5422794117647058 0.58375 0.015441176470588291 0.02499999999999991
9 0.5426470588235295 0.608125 0.016176470588235348 0.026249999999999996
23 0.32389705882352937 0.551875 0.022794117647058798 0.03374999999999995
2 0.3242647058823529 0.5800000000000001 0.016176470588235292 0.02749999999999997
9 0.32499999999999996 0.60625 0.01470588235294118 0.025000000000000022

The train split includes 720 images, the val and test split include 120 images. I checked all annotations, as this was my first guess but all their classes are correct.

Environment

  • YoloV5: YOLOv5 🚀 v6.0-392-g0a20c80 torch 1.10.0+cu102 CUDA:0 (NVIDIA GeForce RTX 2080 Ti, 11019MiB)
  • Available GPUs: 2x NVIDIA GeForce RTX 2080 Ti, 11019MiB
  • CPU: Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
  • RAM: 64GB
  • OS:
    • Distributor ID: Ubuntu
    • Description: Ubuntu 20.04.4 LTS
    • Release: 20.04
    • Codename: focal
  • Nvidia driver Version: 495.29.05
  • Cuda Version: 11.5
  • Python Environment: Conda:
  • dependencies:
    • _libgcc_mutex=0.1=main
    • _openmp_mutex=4.5=1_gnu
    • ca-certificates=2021.10.26=h06a4308_2
    • certifi=2021.10.8=py39h06a4308_2
    • ld_impl_linux-64=2.35.1=h7274673_9
    • libffi=3.3=he6710b0_2
    • libgcc-ng=9.3.0=h5101ec6_17
    • libgomp=9.3.0=h5101ec6_17
    • libstdcxx-ng=9.3.0=hd4cf53a_17
    • ncurses=6.3=h7f8727e_2
    • openssl=1.1.1m=h7f8727e_0
    • pip=21.2.4=py39h06a4308_0
    • python=3.9.7=h12debd9_1
    • readline=8.1.2=h7f8727e_1
    • setuptools=58.0.4=py39h06a4308_0
    • sqlite=3.37.2=hc218d9a_0
    • tk=8.6.11=h1ccaba5_0
    • tzdata=2021e=hda174b7_0
    • wheel=0.37.1=pyhd3eb1b0_0
    • xz=5.2.5=h7b6447c_0
    • zlib=1.2.11=h7f8727e_4
    • pip:
      • absl-py==1.0.0
      • cachetools==5.0.0
      • charset-normalizer==2.0.7
      • click==8.0.4
      • cycler==0.11.0
      • docker-pycreds==0.4.0
      • fonttools==4.28.2
      • gitdb==4.0.9
      • gitpython==3.1.27
      • google-auth==2.6.0
      • google-auth-oauthlib==0.4.6
      • grpcio==1.44.0
      • idna==3.3
      • importlib-metadata==4.11.1
      • kaggle==1.5.12
      • kiwisolver==1.3.2
      • markdown==3.3.6
      • matplotlib==3.5.0
      • numpy==1.21.4
      • oauthlib==3.2.0
      • opencv-python==4.5.4.60
      • packaging==21.3
      • pandas==1.3.4
      • pathtools==0.1.2
      • pillow==8.4.0
      • promise==2.3
      • protobuf==3.19.4
      • psutil==5.9.0
      • pyasn1==0.4.8
      • pyasn1-modules==0.2.8
      • pycocotools==2.0.4
      • pyparsing==3.0.6
      • python-dateutil==2.8.2
      • python-slugify==5.0.2
      • pytz==2021.3
      • pyyaml==6.0
      • requests==2.26.0
      • requests-oauthlib==1.3.1
      • rsa==4.8
      • scipy==1.7.2
      • seaborn==0.11.2
      • sentry-sdk==1.5.6
      • setuptools-scm==6.3.2
      • shortuuid==1.0.8
      • six==1.16.0
      • smmap==5.0.0
      • tensorboard==2.8.0
      • tensorboard-data-server==0.6.1
      • tensorboard-plugin-wit==1.8.1
      • termcolor==1.1.0
      • text-unidecode==1.3
      • thop==0.0.31-2005241907
      • tomli==1.2.2
      • torch==1.10.0
      • torchvision==0.11.1
      • tqdm==4.62.3
      • typing-extensions==4.0.0
      • urllib3==1.26.7
      • wandb==0.12.10
      • werkzeug==2.0.3
      • yaspin==2.1.0
      • zipp==3.7.0

Minimal Reproducible Example

See bug report above

Additional

Any help would be appreciated! If more information is needed, feel free to ask.

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!
@lkno0705 lkno0705 added the bug Something isn't working label Apr 5, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Apr 5, 2022

👋 Hello @lkno0705, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://ultralytics.com or email support@ultralytics.com.

Requirements

Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

@glenn-jocher
Copy link
Member

glenn-jocher commented Apr 5, 2022

@lkno0705 we don't assist in debugging custom code, but you can start from the official usage example shown in val.py. If you encounter any reproducible errors following the official usage example please let us know!

yolov5/val.py

Lines 5 to 18 in 5f97001

Usage:
$ python path/to/val.py --weights yolov5s.pt --data coco128.yaml --img 640
Usage - formats:
$ python path/to/val.py --weights yolov5s.pt # PyTorch
yolov5s.torchscript # TorchScript
yolov5s.onnx # ONNX Runtime or OpenCV DNN with --dnn
yolov5s.xml # OpenVINO
yolov5s.engine # TensorRT
yolov5s.mlmodel # CoreML (MacOS-only)
yolov5s_saved_model # TensorFlow SavedModel
yolov5s.pb # TensorFlow GraphDef
yolov5s.tflite # TensorFlow Lite
yolov5s_edgetpu.tflite # TensorFlow Edge TPU

@lkno0705
Copy link
Author

lkno0705 commented Apr 5, 2022

@glenn-jocher That's understandable. However, the problem also occurs when using the example command in val.py:

python val.py --weights yolov5m.pt --data gtsdb.yaml --img 1360
wandb: Currently logged in as: ***** (use `wandb login --relogin` to force relogin)
val: data=/home/leon/studienarbeit/EvalFramework/data/models/yolov5/yolov5_git/data/gtsdb.yaml, weights=['yolov5m.pt'], batch_size=32, imgsz=1360, conf_thres=0.001, iou_thres=0.6, task=val, device=, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=False, project=runs/val, name=exp, exist_ok=False, half=False, dnn=False
YOLOv5 🚀 v6.0-392-g0a20c80 torch 1.10.0+cu102 CUDA:0 (NVIDIA GeForce RTX 2080 Ti, 11019MiB)

Fusing layers... 
YOLOv5m summary: 369 layers, 21190557 parameters, 0 gradients, 49.1 GFLOPs
WARNING: --img-size 1360 must be multiple of max stride 32, updating to 1376
val: Scanning '/home/leon/studienarbeit/EvalFramework/data/datasets/gtsdb/yolo/val/labels.cache' images and labels..
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95:  25%|██▌       | 1/4 [00:04<
Traceback (most recent call last):
  File "/home/leon/studienarbeit/EvalFramework/val.py", line 390, in <module>
    main(opt)
  File "/home/leon/studienarbeit/EvalFramework/val.py", line 363, in main
    run(**vars(opt))
  File "/home/leon/miniconda3/envs/evalFramework/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/home/leon/studienarbeit/EvalFramework/val.py", line 240, in run
    confusion_matrix.process_batch(predn, labelsn)
  File "/home/leon/studienarbeit/EvalFramework/data/models/yolov5/yolov5_git/utils/metrics.py", line 156, in process_batch
    self.matrix[detection_classes[m1[j]], gc] += 1  # correct
IndexError: index 74 is out of bounds for axis 0 with size 44

@glenn-jocher
Copy link
Member

glenn-jocher commented Apr 5, 2022

@lkno0705 thanks for the update! It looks like you are passing incompatible combination of --weights and --data. yolov5m.pt is trained on the COCO dataset, you can not validate it on anything other than the COCO dataset.

@lkno0705
Copy link
Author

lkno0705 commented Apr 5, 2022

@glenn-jocher Yeiks, that mistake was so dumb.. Thanks for the hint! It works now... It makes sense that if you select the correct model to validate, that then the prediction classes are matching and then no index error while validating occurs... Thanks a lot! It seems like I've looked at my screen for too long today! Have a great day!

@lkno0705 lkno0705 closed this as completed Apr 5, 2022
glenn-jocher added a commit that referenced this issue Apr 5, 2022
Improved error messages for understanding of user error with val.py. May help #7291
glenn-jocher added a commit that referenced this issue Apr 5, 2022
Improved error messages for understanding of user error with val.py. May help #7291
@glenn-jocher glenn-jocher linked a pull request Apr 5, 2022 that will close this issue
@glenn-jocher
Copy link
Member

glenn-jocher commented Apr 5, 2022

@lkno0705 good news 😃! Your original issue may now be improved ✅ in PR #7292. This PR adds better error handling for more informative error messages to help users self-diagnose the problem better.

To receive this update:

  • Gitgit pull from within your yolov5/ directory or git clone https://github.com/ultralytics/yolov5 again
  • PyTorch Hub – Force-reload model = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)
  • Notebooks – View updated notebooks Open In Colab Open In Kaggle
  • Dockersudo docker pull ultralytics/yolov5:latest to update your image Docker Pulls

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!

BjarneKuehl pushed a commit to fhkiel-mlaip/yolov5 that referenced this issue Aug 26, 2022
Improved error messages for understanding of user error with val.py. May help ultralytics#7291
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants