Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datasets with different aspect ratios vs shuffling of dataset #932

Closed
shayanalibhatti opened this issue Sep 8, 2020 · 9 comments · Fixed by #5623
Closed

Datasets with different aspect ratios vs shuffling of dataset #932

shayanalibhatti opened this issue Sep 8, 2020 · 9 comments · Fixed by #5623

Comments

@shayanalibhatti
Copy link

Hi,

Great work developing yolov5. I have a question. Imagine you are combining different datasets, such as COCO (which has images of different aspect ratios), then some other datasets with image of same aspect ratio among that dataset.

Wouldn't rectangular training (which sorts images by aspect ratio) hurt the shuffling of dataset ? as it would sort the images of same aspect ratio to be together in one batch. Thus, it might sort COCO first and then the other datasets in a serial manner. Or would just turning rectangular training OFF and shuffling dataset, do the job so that our model generalizes even on different datasets with images of different aspect ratios ?

Can you elaborate on this and correct my understanding?

@glenn-jocher
Copy link
Member

With default settings this is a non-issue.

@glenn-jocher
Copy link
Member

@shayanalibhatti good news 😃! Your original issue may now be fixed ✅ in PR #5623 by @werner-duvaud. This PR turns on shuffling in the YOLOv5 training DataLoader by default, which was missing until now. This works for all training formats: CPU, Single-GPU, Multi-GPU DDP.

train_loader, dataset = create_dataloader(train_path, imgsz, batch_size // WORLD_SIZE, gs, single_cls,
                                          hyp=hyp, augment=True, cache=opt.cache, rect=opt.rect, rank=LOCAL_RANK,
                                          workers=workers, image_weights=opt.image_weights, quad=opt.quad,
                                          prefix=colorstr('train: '), shuffle=True)  # <--- NEW

I evaluated this PR against master on VOC finetuning for 50 epochs, and the results show a slight improvement in most metrics and losses, particularly in objectness loss and mAP@0.5, perhaps indicating that the shuffle addition may help delay overtraining.

https://wandb.ai/glenn-jocher/VOC
Screenshot 2021-11-13 at 13 03 26

To receive this update:

  • Gitgit pull from within your yolov5/ directory or git clone https://github.com/ultralytics/yolov5 again
  • PyTorch Hub – Force-reload model = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)
  • Notebooks – View updated notebooks Open In Colab Open In Kaggle
  • Dockersudo docker pull ultralytics/yolov5:latest to update your image Docker Pulls

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!

@shayanalibhatti
Copy link
Author

shayanalibhatti commented Nov 15, 2021

@glenn-jocher Thanks. Glad to know my insight was of help and will be, to the community. I am not working on computer vision related project right now so I cant test it but your observation of improvement in results is great. Keep up the great work.

@yizweithree
Copy link

@shayanalibhatti good news 😃! Your original issue may now be fixed ✅ in PR #5623 by @werner-duvaud. This PR turns on shuffling in the YOLOv5 training DataLoader by default, which was missing until now. This works for all training formats: CPU, Single-GPU, Multi-GPU DDP.

train_loader, dataset = create_dataloader(train_path, imgsz, batch_size // WORLD_SIZE, gs, single_cls,
                                          hyp=hyp, augment=True, cache=opt.cache, rect=opt.rect, rank=LOCAL_RANK,
                                          workers=workers, image_weights=opt.image_weights, quad=opt.quad,
                                          prefix=colorstr('train: '), shuffle=True)  # <--- NEW

I evaluated this PR against master on VOC finetuning for 50 epochs, and the results show a slight improvement in most metrics and losses, particularly in objectness loss and mAP@0.5, perhaps indicating that the shuffle addition may help delay overtraining.

https://wandb.ai/glenn-jocher/VOC Screenshot 2021-11-13 at 13 03 26

To receive this update:

  • Gitgit pull from within your yolov5/ directory or git clone https://github.com/ultralytics/yolov5 again
  • PyTorch Hub – Force-reload model = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)
  • Notebooks – View updated notebooks Open In Colab Open In Kaggle
  • Dockersudo docker pull ultralytics/yolov5:latest to update your image Docker Pulls

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!

How to use shuffel in this new Update? or is shuffel turned on by default ?

@glenn-jocher
Copy link
Member

@yizweithree shuffle is enabled by default now for training sets.

@yangsiyu007
Copy link

@glenn-jocher thanks for updating this and other threads on the issue! Can you elaborate more on the rationale for sorting the images by aspect ratio when doing "rectangular" training? With the above PR, images are still sorted by aspect ratio during evaluation on the val set, since currently the shuffle option will be turned off if rect is used. This is problematic for me because the images visualized in Weights and Biases will be from the same collection/with the most extreme aspect ratio. What would not sorting by aspect ratio affect? Thanks!

@glenn-jocher
Copy link
Member

@yangsiyu007 I don't understand your question. shuffle is enabled for train, aspect ratio sorted in val for speed.

@yangsiyu007
Copy link

@glenn-jocher I'm wondering why sorting by aspect ratio would speed up inference? :)

@glenn-jocher
Copy link
Member

@yangsiyu007 you don't have to ask me, you can set rect=False in the val.py dataloader and profile both ways

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants