Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

where is the shuffle in dataloader? #1919

Closed
Laughing-q opened this issue Jan 13, 2021 · 3 comments · Fixed by #5623
Closed

where is the shuffle in dataloader? #1919

Laughing-q opened this issue Jan 13, 2021 · 3 comments · Fixed by #5623
Labels
question Further information is requested

Comments

@Laughing-q
Copy link
Member

❔Question

it seems like there is no shuffle when create a dataloader, is this better ? or I just didn't find where shuffle is ?

Additional context

maybe shuffle is not that important when we using mosaic which set default, but shuffle is important when we training without any augment or mosaic.

@Laughing-q Laughing-q added the question Further information is requested label Jan 13, 2021
@glenn-jocher
Copy link
Member

glenn-jocher commented Jan 13, 2021

@Laughing-q yes shuffle=False by default when training and testing. There are a few threads on this topic, but I believe everything this fine, because 3/4 of the images in each mosaic are shuffled, so in reality 75% of the content is effectively shuffled with the 4-mosaic, or 8/9 if the 9-mosaic is used.

@Laughing-q
Copy link
Member Author

@Laughing-q yes shuffle=False by default when training and testing. There are a few threads on this topic, but I believe everything this fine, because 3/4 of the images in each mosaic are shuffled, so in reality 75% of the content is effectively shuffled with the 4-mosaic, or 8/9 if the 9-mosaic is used.

I see, thanks

@glenn-jocher glenn-jocher linked a pull request Nov 13, 2021 that will close this issue
@glenn-jocher
Copy link
Member

glenn-jocher commented Nov 13, 2021

@Laughing-q good news 😃! Your original issue may now be fixed ✅ in PR #5623 by @werner-duvaud. This PR turns on shuffling in the YOLOv5 training DataLoader by default, which was missing until now. This works for all training formats: CPU, Single-GPU, Multi-GPU DDP.

train_loader, dataset = create_dataloader(train_path, imgsz, batch_size // WORLD_SIZE, gs, single_cls,
                                          hyp=hyp, augment=True, cache=opt.cache, rect=opt.rect, rank=LOCAL_RANK,
                                          workers=workers, image_weights=opt.image_weights, quad=opt.quad,
                                          prefix=colorstr('train: '), shuffle=True)  # <--- NEW

I evaluated this PR against master on VOC finetuning for 50 epochs, and the results show a slight improvement in most metrics and losses, particularly in objectness loss and mAP@0.5, perhaps indicating that the shuffle addition may help delay overtraining.

Screenshot 2021-11-13 at 13 03 26

https://wandb.ai/glenn-jocher/VOC

To receive this update:

  • Gitgit pull from within your yolov5/ directory or git clone https://github.com/ultralytics/yolov5 again
  • PyTorch Hub – Force-reload model = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)
  • Notebooks – View updated notebooks Open In Colab Open In Kaggle
  • Dockersudo docker pull ultralytics/yolov5:latest to update your image Docker Pulls

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants