With mosaic augmentation, are only 3 of the 4 images shuffled? #1016

bsugerman · 2020-09-22T15:23:11Z

Looking at the create_dataloader and load_mosaic functions, it looks like the order of the first image in each mosaic never changes, while the 2-4th images are random choices from the full dataset. In other words, the first mosaic uses image[0] in the image data list and then 3 random ones, the 2nd mosaic uses image[1] and 3 random ones, etc. However, the ordering of the image list is never shuffled. Is that correct?

The text was updated successfully, but these errors were encountered:

glenn-jocher · 2020-09-22T19:31:39Z

@bsugerman I looked into the code and yes, your interpretation is correct.

If you wanted to shuffle image 0 in the mosaic you would pass an additional argument shuffle=True to the dataloader here:

yolov5/utils/datasets.py

Lines 66 to 71 in 702c4fa

    
           dataloader = InfiniteDataLoader(dataset, 
        
                                           batch_size=batch_size, 
        
                                           num_workers=nw, 
        
                                           sampler=sampler, 
        
                                           pin_memory=True, 
        
                                           collate_fn=LoadImagesAndLabels.collate_fn)  # torch.utils.data.DataLoader()

glenn-jocher · 2020-09-22T19:32:17Z

@bsugerman BTW, if you notice any improved results with this change please let us know!

NanoCode012 · 2020-09-23T04:31:50Z

Btw, I don't think you should set shuffle for DDP mode since we use sampler with the current code.

sampler (Sampler or Iterable, optional) – defines the strategy to draw samples from the dataset. Can be any Iterable with len implemented. If specified, shuffle must not be specified.

From https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader

An option would be to implement RandomSampler or similar for Single GPU.

https://pytorch.org/docs/stable/data.html

glenn-jocher · 2020-09-23T17:30:08Z

@NanoCode012 ah good point. In the end I doubt shuffling would have a significant impact, as @bsugerman mentioned the train set is already 75% shuffled even without any shuffling.

bsugerman · 2020-09-23T19:01:56Z

Also however, this means that training uses each image an average of 4 times per epoch.

@NanoCode012 ah good point. In the end I doubt shuffling would have a significant impact, as @bsugerman mentioned the train set is already 75% shuffled even without any shuffling.

github-actions · 2020-10-24T00:48:53Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

glenn-jocher · 2021-11-13T12:09:16Z

@bsugerman @NanoCode012 good news 😃! Your original issue may now be fixed ✅ in PR #5623 by @werner-duvaud. This PR turns on shuffling in the YOLOv5 training DataLoader by default, which was missing until now. This works for all training formats: CPU, Single-GPU, Multi-GPU DDP.

train_loader, dataset = create_dataloader(train_path, imgsz, batch_size // WORLD_SIZE, gs, single_cls,
                                          hyp=hyp, augment=True, cache=opt.cache, rect=opt.rect, rank=LOCAL_RANK,
                                          workers=workers, image_weights=opt.image_weights, quad=opt.quad,
                                          prefix=colorstr('train: '), shuffle=True)  # <--- NEW

I evaluated this PR against master on VOC finetuning for 50 epochs, and the results show a slight improvement in most metrics and losses, particularly in objectness loss and mAP@0.5, perhaps indicating that the shuffle addition may help delay overtraining.

https://wandb.ai/glenn-jocher/VOC

To receive this update:

Git – git pull from within your yolov5/ directory or git clone https://github.com/ultralytics/yolov5 again
PyTorch Hub – Force-reload model = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)
Notebooks – View updated notebooks
Docker – sudo docker pull ultralytics/yolov5:latest to update your image

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!

bsugerman added the question Further information is requested label Sep 22, 2020

github-actions bot added the Stale label Oct 24, 2020

github-actions bot closed this as completed Oct 29, 2020

glenn-jocher mentioned this issue Jan 13, 2021

where is the shuffle in dataloader? #1919

Closed

glenn-jocher linked a pull request Nov 13, 2021 that will close this issue

Default DataLoader shuffle=True for training #5623

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

With mosaic augmentation, are only 3 of the 4 images shuffled? #1016

With mosaic augmentation, are only 3 of the 4 images shuffled? #1016

bsugerman commented Sep 22, 2020

glenn-jocher commented Sep 22, 2020

glenn-jocher commented Sep 22, 2020

NanoCode012 commented Sep 23, 2020

glenn-jocher commented Sep 23, 2020

bsugerman commented Sep 23, 2020

github-actions bot commented Oct 24, 2020

glenn-jocher commented Nov 13, 2021

With mosaic augmentation, are only 3 of the 4 images shuffled? #1016

With mosaic augmentation, are only 3 of the 4 images shuffled? #1016

Comments

bsugerman commented Sep 22, 2020

glenn-jocher commented Sep 22, 2020

glenn-jocher commented Sep 22, 2020

NanoCode012 commented Sep 23, 2020

glenn-jocher commented Sep 23, 2020

bsugerman commented Sep 23, 2020

github-actions bot commented Oct 24, 2020

glenn-jocher commented Nov 13, 2021