Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specifying Label Path in Customized Dataset #8246

Closed
1 task done
bryanbocao opened this issue Jun 17, 2022 · 12 comments
Closed
1 task done

Specifying Label Path in Customized Dataset #8246

bryanbocao opened this issue Jun 17, 2022 · 12 comments
Labels
question Further information is requested Stale

Comments

@bryanbocao
Copy link

Search before asking

Question

Hello! I like the way this repo organize! I was trying to do some sort of "grid search" for investigating performance of Yolo. Specifically, I have a base coco dataset, the one exactly downloaded by the script in coco.yaml and would like to have variations on two levels: (1) in the image data level, I do some image processing and have different sets of image, say images_v2, images_v3, images_v4 while images is the base one; (2) in the label level for bbox, I also have different variations such as changing label names, number of classes or category ids saved in various sets of label folders: labels_v2, labels_v3, labels_v4.

Below is a brief structure of files in dataset/coco:

images
images_v2
images_v3
images_v4
labels
labels_v2
labels_v3
labels_v4
train2017.txt
val2017.txt
test-dev2017.txt

By "Grid Search" I mean I will have one result for each pair of images* and labels*, resulting in 4(images) x 4(labels) =16 sets of experiments in total.

Q1: Is there any way to do that efficiently?

A straight forward way is to have 16 datasets of coco like coco_1, coco_2, coco_3 while each corresponds to one pair of images* and labels*. However, it requires 16 x 20.1GB=321.6GB space which is too much for me.

When sweeping images*, it seems that I can just change the image paths in train2017.txt and val2017.txt, but the default label path is labels and I don't see I can specify the path in https://github.com/ultralytics/yolov5/blob/master/data/coco.yaml. Q2: Is there any way to do that?

Appreciate your help!

Additional

No response

@bryanbocao bryanbocao added the question Further information is requested label Jun 17, 2022
@glenn-jocher
Copy link
Member

glenn-jocher commented Jun 17, 2022

@bryanbo-cao 👋 Hello! Thanks for asking about YOLOv5 🚀 dataset formatting. You could just use one data.yaml and bash script to rename your directories between each of the 16 trainings.

For examples of using image directories instead of txt lists of images see other datasets like VOC.yaml:

yolov5/data/VOC.yaml

Lines 1 to 21 in d605138

# YOLOv5 🚀 by Ultralytics, GPL-3.0 license
# PASCAL VOC dataset http://host.robots.ox.ac.uk/pascal/VOC by University of Oxford
# Example usage: python train.py --data VOC.yaml
# parent
# ├── yolov5
# └── datasets
# └── VOC ← downloads here (2.8 GB)
# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: ../datasets/VOC
train: # train images (relative to 'path') 16551 images
- images/train2012
- images/train2007
- images/val2012
- images/val2007
val: # val images (relative to 'path') 4952 images
- images/test2007
test: # test images (optional)
- images/test2007

To train correctly your data must be in YOLOv5 format. Please see our Train Custom Data tutorial for full documentation on dataset setup and all steps required to start training your first model. A few excerpts from the tutorial:

1.1 Create dataset.yaml

COCO128 is an example small tutorial dataset composed of the first 128 images in COCO train2017. These same 128 images are used for both training and validation to verify our training pipeline is capable of overfitting. data/coco128.yaml, shown below, is the dataset config file that defines 1) the dataset root directory path and relative paths to train / val / test image directories (or *.txt files with image paths), 2) the number of classes nc and 3) a list of class names:

# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: ../datasets/coco128  # dataset root dir
train: images/train2017  # train images (relative to 'path') 128 images
val: images/train2017  # val images (relative to 'path') 128 images
test:  # test images (optional)

# Classes
nc: 80  # number of classes
names: [ 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
         'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
         'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
         'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
         'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
         'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
         'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
         'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
         'hair drier', 'toothbrush' ]  # class names

1.2 Create Labels

After using a tool like Roboflow Annotate to label your images, export your labels to YOLO format, with one *.txt file per image (if no objects in image, no *.txt file is required). The *.txt file specifications are:

  • One row per object
  • Each row is class x_center y_center width height format.
  • Box coordinates must be in normalized xywh format (from 0 - 1). If your boxes are in pixels, divide x_center and width by image width, and y_center and height by image height.
  • Class numbers are zero-indexed (start from 0).

Image Labels

The label file corresponding to the above image contains 2 persons (class 0) and a tie (class 27):

1.3 Organize Directories

Organize your train and val images and labels according to the example below. YOLOv5 assumes /coco128 is inside a /datasets directory next to the /yolov5 directory. YOLOv5 locates labels automatically for each image by replacing the last instance of /images/ in each image path with /labels/. For example:

../datasets/coco128/images/im0.jpg  # image
../datasets/coco128/labels/im0.txt  # label

Good luck 🍀 and let us know if you have any other questions!

@bryanbocao
Copy link
Author

bryanbocao commented Jun 17, 2022

@glenn-jocher, thanks for pointing it again. I have read this document and succeeded in different custom datasets many times but I am afraid it didn't answer my specific question. The document is about 1 dataset while I am asking N variants of 1 dataset that share the same dataset root dir without duplicating.

@bryanbocao
Copy link
Author

bryanbocao commented Jun 17, 2022

In the above example,

../datasets/coco128/images/im0.jpg  # image
../datasets/coco128/labels/im0.txt  # label

The folder name labels seems to be fixed by default. This document does not specify how to change it if I have labels_v2 or labels_v3 in the same folder:

../datasets/coco128/images/im0.jpg  # image
../datasets/coco128/labels/im0.txt  # label
../datasets/coco128/labels_v2/im0.txt  # label_v2
../datasets/coco128/labels_v3/im0.txt  # label_v3

Thanks!

@github-actions
Copy link
Contributor

github-actions bot commented Jul 19, 2022

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Access additional Ultralytics ⚡ resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

@GusevMihail
Copy link

In the above example,

../datasets/coco128/images/im0.jpg  # image
../datasets/coco128/labels/im0.txt  # label

The folder name labels seems to be fixed by default. This document does not specify how to change it if I have labels_v2 or labels_v3 in the same folder:

../datasets/coco128/images/im0.jpg  # image
../datasets/coco128/labels/im0.txt  # label
../datasets/coco128/labels_v2/im0.txt  # label_v2
../datasets/coco128/labels_v3/im0.txt  # label_v3

Thanks!

Hello, bryanbocao! Did you find a way to solve your problem? I have a same problem now. I would be grateful if you share your experience.

@chobits
Copy link

chobits commented Oct 19, 2022

I have the same problem that I wanna specify path of labels directory.
However, from the source code, this featue is not supported currently, because labels directory is auto generated from xxx/images/xxx images directory, which is what official documents say.

See

self.label_files = img2label_paths(self.im_files) # labels

and

sa, sb = f'{os.sep}images{os.sep}', f'{os.sep}labels{os.sep}' # /images/, /labels/ substrings

@glenn-jocher
Copy link
Member

@chobits hello! Thank you for bringing that to our attention. The labels directory's current auto-generation from the images directory is indeed in line with the current behavior. While specifying a separate path for labels isn't currently supported, your feedback has been duly noted and will be taken into account for future improvements.

Feel free to keep an eye on the release notes and documentation updates for any future changes. We appreciate your understanding and patience!

@Akshaykushawaha
Copy link

@chobits hello! Thank you for bringing that to our attention. The labels directory's current auto-generation from the images directory is indeed in line with the current behavior. While specifying a separate path for labels isn't currently supported, your feedback has been duly noted and will be taken into account for future improvements.

Feel free to keep an eye on the release notes and documentation updates for any future changes. We appreciate your understanding and patience!

Any updates on this?
Seems like a small fix, should have been fixed by now, since it is one of the most basic input feature.

@chobits
Copy link

chobits commented May 13, 2024

@chobits hello! Thank you for bringing that to our attention. The labels directory's current auto-generation from the images directory is indeed in line with the current behavior. While specifying a separate path for labels isn't currently supported, your feedback has been duly noted and will be taken into account for future improvements.
Feel free to keep an eye on the release notes and documentation updates for any future changes. We appreciate your understanding and patience!

Any updates on this? Seems like a small fix, should have been fixed by now, since it is one of the most basic input feature.

I fixed it by modifying my local branch at that time. It's been a long time since I last recalled the context. I didn't verify the official update, but still, thanks you for your work.

@glenn-jocher
Copy link
Member

Hello! Thanks for checking back on this. As of now, there hasn't been an official update to support specifying separate paths for the labels directory directly through the configuration. We understand the importance of this feature and appreciate your input, which helps in enhancing the functionality of YOLOv5.

If there are updates regarding this feature, they'll be included in the release notes and documentation. Thanks once again for your patience and for being a part of the YOLOv5 community! 🌟

@chobits
Copy link

chobits commented May 13, 2024

Hello! Thanks for checking back on this. As of now, there hasn't been an official update to support specifying separate paths for the labels directory directly through the configuration....

Ok, I understood.

If there are updates regarding this feature, they'll be included in the release notes and documentation. Thanks once again for your patience and for being a part of the YOLOv5 community! 🌟

Cool! Looking forward to seeing the new features.

@glenn-jocher
Copy link
Member

Hello! We're glad to hear your enthusiasm and appreciate your support! We'll definitely keep the community updated on any new features and enhancements. If you have any more questions or need further assistance in the meantime, don't hesitate to ask. Happy coding! 😊🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested Stale
Projects
None yet
Development

No branches or pull requests

5 participants