Datasets with different aspect ratios vs shuffling of dataset #932

shayanalibhatti · 2020-09-08T15:15:19Z

Hi,

Great work developing yolov5. I have a question. Imagine you are combining different datasets, such as COCO (which has images of different aspect ratios), then some other datasets with image of same aspect ratio among that dataset.

Wouldn't rectangular training (which sorts images by aspect ratio) hurt the shuffling of dataset ? as it would sort the images of same aspect ratio to be together in one batch. Thus, it might sort COCO first and then the other datasets in a serial manner. Or would just turning rectangular training OFF and shuffling dataset, do the job so that our model generalizes even on different datasets with images of different aspect ratios ?

Can you elaborate on this and correct my understanding?

glenn-jocher · 2020-09-08T19:28:50Z

With default settings this is a non-issue.

glenn-jocher · 2021-11-13T12:09:25Z

@shayanalibhatti good news 😃! Your original issue may now be fixed ✅ in PR #5623 by @werner-duvaud. This PR turns on shuffling in the YOLOv5 training DataLoader by default, which was missing until now. This works for all training formats: CPU, Single-GPU, Multi-GPU DDP.

train_loader, dataset = create_dataloader(train_path, imgsz, batch_size // WORLD_SIZE, gs, single_cls,
                                          hyp=hyp, augment=True, cache=opt.cache, rect=opt.rect, rank=LOCAL_RANK,
                                          workers=workers, image_weights=opt.image_weights, quad=opt.quad,
                                          prefix=colorstr('train: '), shuffle=True)  # <--- NEW

I evaluated this PR against master on VOC finetuning for 50 epochs, and the results show a slight improvement in most metrics and losses, particularly in objectness loss and mAP@0.5, perhaps indicating that the shuffle addition may help delay overtraining.

https://wandb.ai/glenn-jocher/VOC

To receive this update:

Git – git pull from within your yolov5/ directory or git clone https://github.com/ultralytics/yolov5 again
PyTorch Hub – Force-reload model = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)
Notebooks – View updated notebooks
Docker – sudo docker pull ultralytics/yolov5:latest to update your image

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!

shayanalibhatti · 2021-11-15T02:58:33Z

@glenn-jocher Thanks. Glad to know my insight was of help and will be, to the community. I am not working on computer vision related project right now so I cant test it but your observation of improvement in results is great. Keep up the great work.

yizweithree · 2021-11-21T10:05:32Z

@shayanalibhatti good news 😃! Your original issue may now be fixed ✅ in PR #5623 by @werner-duvaud. This PR turns on shuffling in the YOLOv5 training DataLoader by default, which was missing until now. This works for all training formats: CPU, Single-GPU, Multi-GPU DDP.
train_loader, dataset = create_dataloader(train_path, imgsz, batch_size // WORLD_SIZE, gs, single_cls,
                                          hyp=hyp, augment=True, cache=opt.cache, rect=opt.rect, rank=LOCAL_RANK,
                                          workers=workers, image_weights=opt.image_weights, quad=opt.quad,
                                          prefix=colorstr('train: '), shuffle=True)  # <--- NEW
I evaluated this PR against master on VOC finetuning for 50 epochs, and the results show a slight improvement in most metrics and losses, particularly in objectness loss and mAP@0.5, perhaps indicating that the shuffle addition may help delay overtraining.

https://wandb.ai/glenn-jocher/VOC

To receive this update:

Git – git pull from within your yolov5/ directory or git clone https://github.com/ultralytics/yolov5 again

PyTorch Hub – Force-reload model = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)

Notebooks – View updated notebooks

Docker – sudo docker pull ultralytics/yolov5:latest to update your image

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!

How to use shuffel in this new Update? or is shuffel turned on by default ?

glenn-jocher · 2021-11-21T14:05:38Z

@yizweithree shuffle is enabled by default now for training sets.

yangsiyu007 · 2021-11-25T02:39:46Z

@glenn-jocher thanks for updating this and other threads on the issue! Can you elaborate more on the rationale for sorting the images by aspect ratio when doing "rectangular" training? With the above PR, images are still sorted by aspect ratio during evaluation on the val set, since currently the shuffle option will be turned off if rect is used. This is problematic for me because the images visualized in Weights and Biases will be from the same collection/with the most extreme aspect ratio. What would not sorting by aspect ratio affect? Thanks!

glenn-jocher · 2021-11-25T07:59:47Z

@yangsiyu007 I don't understand your question. shuffle is enabled for train, aspect ratio sorted in val for speed.

yangsiyu007 · 2021-11-25T08:05:27Z

@glenn-jocher I'm wondering why sorting by aspect ratio would speed up inference? :)

glenn-jocher · 2021-11-25T08:10:46Z

@yangsiyu007 you don't have to ask me, you can set rect=False in the val.py dataloader and profile both ways

shayanalibhatti closed this as completed Sep 15, 2020

glenn-jocher mentioned this issue Jan 13, 2021

where is the shuffle in dataloader? #1919

Closed

glenn-jocher linked a pull request Nov 13, 2021 that will close this issue

Default DataLoader shuffle=True for training #5623

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datasets with different aspect ratios vs shuffling of dataset #932

Datasets with different aspect ratios vs shuffling of dataset #932

shayanalibhatti commented Sep 8, 2020

glenn-jocher commented Sep 8, 2020

glenn-jocher commented Nov 13, 2021

shayanalibhatti commented Nov 15, 2021 •

edited

Loading

yizweithree commented Nov 21, 2021

glenn-jocher commented Nov 21, 2021

yangsiyu007 commented Nov 25, 2021

glenn-jocher commented Nov 25, 2021

yangsiyu007 commented Nov 25, 2021

glenn-jocher commented Nov 25, 2021

Datasets with different aspect ratios vs shuffling of dataset #932

Datasets with different aspect ratios vs shuffling of dataset #932

Comments

shayanalibhatti commented Sep 8, 2020

glenn-jocher commented Sep 8, 2020

glenn-jocher commented Nov 13, 2021

shayanalibhatti commented Nov 15, 2021 • edited Loading

yizweithree commented Nov 21, 2021

glenn-jocher commented Nov 21, 2021

yangsiyu007 commented Nov 25, 2021

glenn-jocher commented Nov 25, 2021

yangsiyu007 commented Nov 25, 2021

glenn-jocher commented Nov 25, 2021

shayanalibhatti commented Nov 15, 2021 •

edited

Loading