Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is dataset.indices broadcase necessary? #1820

Closed
ardeal opened this issue Aug 21, 2021 · 10 comments
Closed

is dataset.indices broadcase necessary? #1820

ardeal opened this issue Aug 21, 2021 · 10 comments
Labels
question Further information is requested Stale

Comments

@ardeal
Copy link

ardeal commented Aug 21, 2021

Hi,

in train.py of yolov3 code:

            # Broadcast if DDP
            if rank != -1:
                indices = (torch.tensor(dataset.indices) if rank == 0 else torch.zeros(dataset.n)).int()
                dist.broadcast(indices, 0)
                if rank != 0:
                    dataset.indices = indices.cpu().numpy()

as there is: sampler = torch.utils.data.distributed.DistributedSampler(dataset) if rank != -1 else None in the dataloader, I think the dataset.indices manual broadcast is not necessary in this case as DistributedSampler will broadcast dataset automatically?

@ardeal ardeal added the question Further information is requested label Aug 21, 2021
@glenn-jocher
Copy link
Member

@ardeal I'm not sure. This would need some empirical results to verify (testing). Can you try to run the code both ways and check if dataset indices are updated on all workers?

@ardeal
Copy link
Author

ardeal commented Aug 23, 2021

Hi,
@glenn-jocher ,

I have reveiewed torch.distributed toturial, and I am right now ask the quesiton on torch forum:
https://discuss.pytorch.org/t/when-will-dist-all-reduce-will-be-called/129918/6

according to torch's explanation, everythign is done by torch itself. The only thing the user needs to do is to feed the model in to DDP.

Pleasse check the code: https://github.com/rwightman/pytorch-image-models/blob/master/train.py
in the above code, it didn't broadcast dataset, however, it call all_reduce to sync loss(line 689 of the above file).

what confused me is that:
when should dataset broadcast?
when should all_reduce be called?
if all_reduce is called and DDP sync weights are both implemented, will they conflict?

I am strudying torch.distributed, once I make it clear, I will tell you.

@glenn-jocher
Copy link
Member

@ardeal ok got it. Yes please let us know what you find.

One point is that the code you show is only used with --image-weights. This is because the image weights are reassigned after every validation at the end of each training epoch, and then dataset.indices is updated, which determines which images are loaded the next epoch.

Normally dataset indices is just range(n) for n images and the array never changes.

@ardeal
Copy link
Author

ardeal commented Aug 23, 2021

@glenn-jocher

which determines which images are loaded the next epoch.
According to my understanding, this is done by torch.distribued, we don't needed to do it manually. Perhaps, my understanding is wrong. That is why I confirm it on pytorch forum.

You could review the upper link of pytorch-image-models repo which did much data augmentation as your repo and didn't broadcast dataset.

The reason why I asked questions about broadcast dataset is that I find your repo and pytorch-image-models broadcast or not broadcast differently.

@ardeal
Copy link
Author

ardeal commented Aug 25, 2021

@glenn-jocher ,

Please check the answer on pytorch forum.
https://discuss.pytorch.org/t/when-will-dist-all-reduce-will-be-called/129918/11

according to the answer, you don't need to broadcast dataset.
My understanding is:

  1. dataset and weight will be broadcast by DDP automatically if we run standard model.
  2. if there is something else needs to be broadcast or sync, we could call broadcast or all_reduce funciton.

@glenn-jocher
Copy link
Member

glenn-jocher commented Aug 26, 2021

@ardeal thanks for the explanation! So is your recommendation that we update this section:

yolov3/train.py

Lines 252 to 265 in 1be3170

# Update image weights (optional)
if opt.image_weights:
# Generate indices
if rank in [-1, 0]:
cw = model.class_weights.cpu().numpy() * (1 - maps) ** 2 / nc # class weights
iw = labels_to_image_weights(dataset.labels, nc=nc, class_weights=cw) # image weights
dataset.indices = random.choices(range(dataset.n), weights=iw, k=dataset.n) # rand weighted idx
# Broadcast if DDP
if rank != -1:
indices = (torch.tensor(dataset.indices) if rank == 0 else torch.zeros(dataset.n)).int()
dist.broadcast(indices, 0)
if rank != 0:
dataset.indices = indices.cpu().numpy()

To this?

        # Update image weights (optional)
        if opt.image_weights:
            # Generate indices
            if RANK in [-1, 0]:
                cw = model.class_weights.cpu().numpy() * (1 - maps) ** 2 / nc  # class weights
                iw = labels_to_image_weights(dataset.labels, nc=nc, class_weights=cw)  # image weights
                dataset.indices = random.choices(range(dataset.n), weights=iw, k=dataset.n)  # rand weighted idx

or this?

        # Update image weights (optional)
        if opt.image_weights:
            cw = model.class_weights.cpu().numpy() * (1 - maps) ** 2 / nc  # class weights
            iw = labels_to_image_weights(dataset.labels, nc=nc, class_weights=cw)  # image weights
            dataset.indices = random.choices(range(dataset.n), weights=iw, k=dataset.n)  # rand weighted idx

@glenn-jocher
Copy link
Member

@ardeal I've eliminated the DDP portion of the image_weights code in YOLOv5 PR ultralytics/yolov5#4579, as the two are incompatible (I'd forgotten this in our earlier discussions), so this issue should be resolved there now, and these updates will be backpropagated to YOLOv3 eventually.

One question I had for you along the DDP topic is my new YOLOv5 EarlyStopping PR ultralytics/yolov5#4576. It works well for Single-GPU, but in DDP mode the stop variable is not broadcast correctly to all ranks, and training hangs rather than ending. Do you have any ideas on how to implement this correctly?

The idea is that the stopper returns stop=True after patience is exceeded, and then we break the training loop, but only RANK -1, 0 processes are breaking I think.

@ardeal
Copy link
Author

ardeal commented Aug 29, 2021

Hi @glenn-jocher ,
I agree to eliminate DDP portion of image_weights in code.

For the issue about sync up stop variable, I think 2 experiments need to be done:

  1. print the stop single/variable on each GPU to confirm.
  2. if each GPU receives the stop variable, I think each loop on each GPU should exit. if some GPU don't exit, I couldn't figure out the reason. I have 1 suggestion for you: in master process,
import torch.distributed as dist


torch.cuda.empty_cache()
dist.destroy_process_group()

@glenn-jocher
Copy link
Member

@ardeal thanks! Yes the problem is that other RANKS do not receive the stop variable. It's not clear to me how to broadcast the python variable correctly.

The dist.destroy_process_group() command needs to take place outside of the train function, I've placed it in the caller function, otherwise the process hangs and training does not finish.

https://github.com/ultralytics/yolov5/blob/bbfafeabdbf7785f8da5e4f9880df27869a71218/train.py#L507-L512

@github-actions
Copy link

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv3 🚀 resources:

Access additional Ultralytics ⚡ resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv3 🚀 and Vision AI ⭐!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested Stale
Projects
None yet
Development

No branches or pull requests

2 participants