Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError occur when start training #958

Closed
Zzh-tju opened this issue Sep 12, 2020 · 26 comments · Fixed by #3366
Closed

KeyError occur when start training #958

Zzh-tju opened this issue Sep 12, 2020 · 26 comments · Fixed by #3366
Labels
question Further information is requested Stale

Comments

@Zzh-tju
Copy link

Zzh-tju commented Sep 12, 2020

❔Question

@glenn-jocher
Currently, I work on a face detection. I use the following command to train.
python train.py --img 640 --batch 16 --epochs 5 --data ./data/face.yaml --cfg ./models/yolov5s.yaml --weights yolov5s.pt
All the training datasets are

../face/images/train/6000.jpg
../face/images/train/6001.jpg
../face/images/train/6002.jpg
......

And their coresponding labels are

../face/labels/train/6000.txt
../face/labels/train/6001.txt
../face/labels/train/6002.txt
......

But I have an error:

WARNING: /media/zzh/face/images/train/9994.jpg: setting an array element with a sequence.
WARNING: /media/zzh/face/images/train/9995.jpg: setting an array element with a sequence.
WARNING: /media/zzh/face/images/train/9996.jpg: setting an array element with a sequence.
WARNING: /media/zzh/face/images/train/9997.jpg: setting an array element with a sequence.
WARNING: /media/zzh/face/images/train/9998.jpg: setting an array element with a sequence.
WARNING: /media/zzh/face/images/train/9999.jpg: setting an array element with a sequence.
Scanning images: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6120/6120 [00:01<00:00, 4498.79it/s]
Traceback (most recent call last):
  File "train.py", line 456, in <module>
    train(hyp, opt, device, tb_writer)
  File "train.py", line 169, in train
    world_size=opt.world_size, workers=opt.workers)
  File "/media/zzh/yolov5/utils/datasets.py", line 61, in create_dataloader
    rank=rank)
  File "/media/zzh/yolov5/utils/datasets.py", line 380, in __init__
    labels, shapes = zip(*[cache[x] for x in self.label_files])
  File "/media/zzh/yolov5/utils/datasets.py", line 380, in <listcomp>
    labels, shapes = zip(*[cache[x] for x in self.label_files])
KeyError: '/media/zzh/face/labels/train/10000.txt'

And face/labels/train/10000.txt is
0 0.6062500000000001 0.6017543859649123 0.3775 0.5719298245614035

I don't know how can I solve this problem.

@Zzh-tju Zzh-tju added the question Further information is requested label Sep 12, 2020
@Zzh-tju Zzh-tju changed the title KeyError occured when start training KeyError occur when start training Sep 12, 2020
@github-actions
Copy link
Contributor

github-actions bot commented Sep 12, 2020

Hello @Zzh-tju, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook Open In Colab, Docker Image, and Google Cloud Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

  • Cloud-based AI systems operating on hundreds of HD video streams in realtime.
  • Edge AI integrated into custom iOS and Android apps for realtime 30 FPS video inference.
  • Custom data training, hyperparameter evolution, and model exportation to any destination.

For more information please visit https://www.ultralytics.com.

@Zzh-tju
Copy link
Author

Zzh-tju commented Sep 12, 2020

When I changed labels, shapes = zip(*[cache[x] for x in self.label_files])
back to labels, shapes = zip(*[cache[x] for x in self.img_files]) in utils/datasets.py
Error changed too:

20                -1  1    313088  models.common.BottleneckCSP             [256, 256, 1, False]
 21                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]
 22          [-1, 10]  1         0  models.common.Concat                    [1]
 23                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]
 24      [17, 20, 23]  1     18879  models.yolo.Detect                      [2, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
Model Summary: 191 layers, 7.25779e+06 parameters, 7.25779e+06 gradients

Transferred 362/370 items from yolov5s.pt
Optimizer groups: 62 .bias, 70 conv.weight, 59 other
TypeError: float() argument must be a string or a number, not 'tuple'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "train.py", line 456, in <module>
    train(hyp, opt, device, tb_writer)
  File "train.py", line 169, in train
    world_size=opt.world_size, workers=opt.workers)
  File "/media/zzh/yolov5/utils/datasets.py", line 61, in create_dataloader
    rank=rank)
  File "/media/zzh/yolov5/utils/datasets.py", line 383, in __init__
    self.shapes = np.array(shapes, dtype=np.float64)
ValueError: setting an array element with a sequence.

@karen-gishyan
Copy link

karen-gishyan commented Sep 12, 2020

Same issue @Zzh-tju, but the issue is new, had no problems until yesterday.

@glenn-jocher
Copy link
Member

glenn-jocher commented Sep 12, 2020

@Zzh-tju @karen-gishyan
Hello, thank you for your interest in our work! This issue seems to lack the minimum requirements for a proper response, or is insufficiently detailed for us to help you. Please note that most technical problems are due to:

  • Your changes to the default repository. If your issue is not reproducible in a new git clone version of this repository we can not debug it. Before going further run this code and ensure your issue persists:
sudo rm -rf yolov5  # remove existing
git clone https://github.com/ultralytics/yolov5 && cd yolov5 # clone latest
python detect.py  # verify detection
# CODE TO REPRODUCE YOUR ISSUE HERE
  • Your custom data. If your issue is not reproducible with COCO or COCO128 data we can not debug it. Visit our Custom Training Tutorial for guidelines on training your custom data. Examine train_batch0.jpg and test_batch0.jpg for a sanity check of training and testing data.

  • Your environment. If your issue is not reproducible in one of the verified environments below we can not debug it. If you are running YOLOv5 locally, ensure your environment meets all of the requirements.txt dependencies specified below.

If none of these apply to you, we suggest you close this issue and raise a new one using the Bug Report template, providing screenshots and minimum viable code to reproduce your issue. Thank you!

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.6. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are passing. These tests evaluate proper operation of basic YOLOv5 functionality, including training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu.

@karen-gishyan
Copy link

karen-gishyan commented Sep 13, 2020

Hello @glenn-jocher , and thanks for your reply. I could see that the issue was with the way the labels were being read. I looked at your commit history in utils/datasets.py, and went back to your previous version, which solved the problem.
self.label_files = [x.replace('images', 'labels').replace(os.path.splitext(x)[-1], '.txt') for x in self.img_files]
I think the new change may certainly be the source of the issue. Thanks.

@sophiatmu
Copy link

@karen-gishyan I got the same problem as you, but it came out with "WARNING: /home/TrafficLight/JPEGImages/10141_0_1.jpg: image size <10 pixels" before, and which previous version were you use? thanks for your reply.

@lolpa1n
Copy link

lolpa1n commented Sep 13, 2020

@karen-gishyan same problem, how do u solve? fixes in datasets.py dont solved the problem

Optimizer groups: 86 .bias, 94 conv.weight, 83 other
TypeError: float() argument must be a string or a number, not 'tuple'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "train.py", line 456, in <module>
    train(hyp, opt, device, tb_writer)
  File "train.py", line 169, in train
    rank=rank, world_size=opt.world_size, workers=opt.workers)
  File "/content/yolov5/utils/datasets.py", line 61, in create_dataloader
    rank=rank)
  File "/content/yolov5/utils/datasets.py", line 379, in __init__
    self.shapes = np.array(shapes, dtype=np.float64)
ValueError: setting an array element with a sequence.

@karen-gishyan
Copy link

karen-gishyan commented Sep 13, 2020

@sophiatmu I know this a temporary solution until the authors take a look at it, but in the utils/datasets.py, changed the code in lines 366,3
67 to the following code, which is the previous commit, and the model worked again.

        self.label_files = [x.replace('images', 'labels').replace(os.path.splitext(x)[-1], '.txt') for x in
                            self.img_files]

glenn-jocher added a commit that referenced this issue Sep 13, 2020
@glenn-jocher
Copy link
Member

glenn-jocher commented Sep 13, 2020

@Zzh-tju @karen-gishyan @lolpa1n @sophiatmu I've pushed a fix which should restore similar functionality to before.

yolov5/utils/datasets.py

Lines 365 to 368 in 806e75f

# Define labels
sa, sb = os.sep + 'images' + os.sep, os.sep + 'labels' + os.sep # /images/, /labels/ substrings
self.label_files = [x.replace(sa, sb, 1).replace(os.path.splitext(x)[-1], '.txt') for x in self.img_files]

Note that label paths are defined as the image paths with a .replace() statement that will replace the last instance of /images/ with /labels/ in your image paths.

@glenn-jocher
Copy link
Member

The dataset structure example provided by @Zzh-tju should work with no issues:

../face/images/train/6000.jpg
../face/images/train/6001.jpg
../face/images/train/6002.jpg
......

And their coresponding labels are

../face/labels/train/6000.txt
../face/labels/train/6001.txt
../face/labels/train/6002.txt
......

CI tests on fix 806e75f are all green.
https://github.com/ultralytics/yolov5/actions/runs/252688252

@Zzh-tju
Copy link
Author

Zzh-tju commented Sep 14, 2020

OK, fixed.

@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@21143
Copy link

21143 commented Oct 22, 2020

Hi. I am facing the same issue when training on my custom dataset in ubuntu 18.04. However, this issue does not come up with coco128 dataset.
On windows 10, I do not face this issue at all with either my custom dataset or coco128. Any thoughts on why this could be happening and where I should be looking to fix this?

@glenn-jocher
Copy link
Member

glenn-jocher commented Oct 22, 2020

@21143 it appears you may have environment problems. Please ensure you meet all dependency requirements if you are attempting to run YOLOv5 locally. If in doubt, create a new virtual Python 3.8 environment, clone the latest repo (code changes daily), and pip install -r requirements.txt again. We also highly recommend using one of our verified environments below.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.6. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are passing. These tests evaluate proper operation of basic YOLOv5 functionality, including training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu.

@21143
Copy link

21143 commented Oct 22, 2020

Update: Fixed my issue by deleting the train.cache and val.cache files in the labels folder and re-running the training. I'm able to run training code now. Thanks !

burglarhobbit pushed a commit to burglarhobbit/yolov5 that referenced this issue Jan 1, 2021
@xiaowo1996
Copy link
Contributor

train in my windows is ok , but when i upload to server gpu to train is occur error, i fix it by delete the label.cache in data folder

KMint1819 pushed a commit to KMint1819/yolov5 that referenced this issue May 12, 2021
@xqneko
Copy link

xqneko commented May 27, 2021

I just found another thing that can cause this error: blank lines in the label file. So when I made labels for yolov5 training, I printed the labels in a wrong way, which made unintended blank lines between each object in a label file. Then I removed the blank lines of the label files, and the training works normally again.

It makes sense because this error happened while caching labels not images.

@github-actions github-actions bot removed the Stale label Jun 4, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Jul 5, 2021

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Access additional Ultralytics ⚡ resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

@github-actions github-actions bot added the Stale label Jul 5, 2021
Lechtr pushed a commit to Lechtr/yolov5 that referenced this issue Jul 20, 2021
BjarneKuehl pushed a commit to fhkiel-mlaip/yolov5 that referenced this issue Aug 26, 2022
BjarneKuehl pushed a commit to fhkiel-mlaip/yolov5 that referenced this issue Aug 26, 2022
SecretStar112 added a commit to SecretStar112/yolov5 that referenced this issue May 24, 2023
@Farhad2590
Copy link

Screenshot 2023-07-28 190021

how can i solve this problem?

@glenn-jocher
Copy link
Member

@Farhad2590 hi there! It seems like you are running into an issue with the YOLOv5 training process. To help you out, could you please provide some more details about the specific problem you are facing? Specifically, any error messages or stack traces that you are encountering would be helpful in diagnosing the issue.

Additionally, please share the command or code that you are using for training, as well as any relevant information about your dataset and environment. With this information, we can better understand the problem and provide you with an appropriate solution.

Looking forward to assisting you further!

@Farhad2590
Copy link

I am trying to yolov7 model
%cd /content/drive/MyDrive/yolov7
!python train.py --workers 1 --device 0 --batch-size 4 --epochs 10 --img 640 640 --hyp data/hyp.scratch.custom.yaml --name yolov7-custom --weights yolov7.pt
herer is the code , I couldn't find any error message just showing this lines "Transferred 554/560 items from yolov7.pt
Traceback (most recent call last):
File "/content/drive/MyDrive/yolov7/train.py", line 616, in
train(hyp, opt, device, tb_writer)
File "/content/drive/MyDrive/yolov7/train.py", line 98, in train
train_path = data_dict['train']
KeyError: 'train'
"
in google colab i am trying to run this using free gpu

@glenn-jocher
Copy link
Member

@Farhad2590 make sure you have a data.yaml file specified and it is correctly formatted. This error typically occurs when the train key is missing or misnamed in the data.yaml file.

For example, the data.yaml file should look similar to this:

train: path/to/train.txt
val: path/to/val.txt
test: path/to/test.txt

nc: 80
names: ['class1', 'class2', ..., 'classN']

Ensure that you have correctly defined the path to your training dataset in the train field of the data.yaml file. Double-check the spelling and make sure the path to your training set is correct.

If this issue persists, please provide the content of your data.yaml file and any other relevant details that could help us further investigate the problem.

@sanjayjackson
Copy link

detect: weights=['last.pt'], source=test, data=data\coco128.yaml, imgsz=[416, 416], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=True, save_csv=False, save_conf=True, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs\detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False, vid_stride=1
YOLOv5 2023-9-14 Python-3.8.0 torch-2.0.1+cpu CPU

Fusing layers...
Model summary: 157 layers, 7012822 parameters, 0 gradients, 15.8 GFLOPs
Traceback (most recent call last):
File "detect.py", line 285, in
main(opt)
File "detect.py", line 280, in main
run(**vars(opt))
File "C:\Users\sanjay\anaconda3\envs\pytorch-gpu\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "detect.py", line 101, in run
model = DetectMultiBackend(weights, device=device, dnn=dnn, data=data, fp16=half)
File "H:\detection\yolov5-master\models\common.py", line 513, in init
if names[0] == 'n01440764' and len(names) == 1000: # ImageNet
KeyError: 0

@glenn-jocher
Copy link
Member

@sanjayjackson this error typically occurs when there is an issue with your class labels in the provided data.yaml file. The error message suggests that there is a problem with the class label indexing.

To resolve this issue, please ensure the following:

  1. Verify that your data.yaml file is correctly formatted and contains the necessary information. Specifically, check that the names field is a list of class names and that it is not empty.

  2. Double-check the indices of your class labels. The error message indicates that there might be an issue with the indexing of the class labels. Ensure that the first class label in your names list has an index of 0. If you are using an external script to generate the data.yaml file, make sure it is correctly generating the class labels and their corresponding indices.

  3. If you are using a pre-trained model for detection, ensure that the data.yaml file provided matches the configuration of the pre-trained model.

By verifying the above points, you should be able to resolve the KeyError: 0 issue. If the problem persists, please provide more details about your setup and the specific steps you followed to encounter this error.

@sanjayjackson
Copy link

sanjayjackson commented Sep 22, 2023

thanks i tried but didn't work
this data.yml file

# Example usage: python train.py --data coco128.yaml
# parent
# ├── yolov5
# └── datasets
#     └── coco128  ← downloads here (7 MB)


# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
#path: ../datasets/coco128  # dataset root dir
train: H:\detection\yolov5-master\data\train_data\images\train  # train images (relative to 'path') 128 images
val: H:\detection\yolov5-master\data\train_data\images\val  # val images (relative to 'path') 128 images
#test:  # test images (optional)

# Classes
nc: 1
names: ['numberplate']
  

# Download script/URL (optional)
#download: https://ultralytics.com/assets/coco128.zip

###################################
what the error is

@glenn-jocher
Copy link
Member

@sanjayjackson it seems like you are still encountering issues with your YOLOv5 training, even after modifying the data.yaml file. The error you are facing might be due to various reasons.

Here are a few things you can try to resolve the issue:

  1. Ensure that the paths specified in the train and val fields of the data.yaml file are correct. Double-check the directory structure and confirm that the train and validation image folders are in the right location.

  2. Verify that the image file extensions are correct. YOLOv5 expects image files with certain extensions (e.g., .jpg, .png). Make sure all your images have the correct file extension.

  3. Check if there are any misspellings or extra spaces in the class names defined in the names field. Ensure that the class name 'numberplate' matches the label names exactly as they are used in your datasets.

  4. Confirm that the number of classes specified in the nc field matches the total number of classes in your dataset (in this case, 1 for the 'numberplate' class).

  5. If you are using a YAML file with UTF-8 encoding, ensure there are no hidden special characters that might be causing parsing issues. Try opening the file in a text editor that can display non-visible characters and remove any unwanted characters if found.

Please check these suggestions and let me know if the issue persists or if you have any further questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested Stale
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants