Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Device mismatch when an image in the input batch gives 0 detections at inference time #1617

Closed
baldassarreFe opened this issue Dec 6, 2020 · 5 comments · Fixed by #1619
Closed
Labels
bug Something isn't working

Comments

@baldassarreFe
Copy link

🐛 Bug

The bug happens when:

  • yolo is loaded from torch hub (I haven't tried otherwise)
  • yolo is placed on a cuda device
  • autoshape is enabled
  • no objects are detected for one image of the batch

To Reproduce

Input:

import torch
import PIL.Image

yolo = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)
yolo.to('cuda:0')
yolo = yolo.autoshape()

images = [
    PIL.Image.new('RGB', (640, 480)), # empty picture
    PIL.Image.open('picture.jpg'),
]

det = yolo(images)

Output:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-5-e00d4d91b5be> in <module>
     11 ]
     12 
---> 13 det = yolo(images)

~/miniconda3/envs/wstal/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

~/.cache/torch/hub/ultralytics_yolov5_master/models/common.py in forward(self, imgs, size, augment, profile)
    171                 y[i][:, :4] = scale_coords(shape1, y[i][:, :4], shape0[i])
    172 
--> 173         return Detections(imgs, y, self.names)
    174 
    175 

~/.cache/torch/hub/ultralytics_yolov5_master/models/common.py in __init__(self, imgs, pred, names)
    185         d = pred[0].device  # device
    186         gn = [torch.tensor([*[im.shape[i] for i in [1, 0, 1, 0]], 1., 1.], device=d) for im in imgs]  # normalizations
--> 187         self.xyxyn = [x / g for x, g in zip(self.xyxy, gn)]  # xyxy normalized
    188         self.xywhn = [x / g for x, g in zip(self.xywh, gn)]  # xywh normalized
    189         self.n = len(self.pred)

~/.cache/torch/hub/ultralytics_yolov5_master/models/common.py in <listcomp>(.0)
    185         d = pred[0].device  # device
    186         gn = [torch.tensor([*[im.shape[i] for i in [1, 0, 1, 0]], 1., 1.], device=d) for im in imgs]  # normalizations
--> 187         self.xyxyn = [x / g for x, g in zip(self.xyxy, gn)]  # xyxy normalized
    188         self.xywhn = [x / g for x, g in zip(self.xywh, gn)]  # xywh normalized
    189         self.n = len(self.pred)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Expected behavior

Running inference on a batch should not cause an error if one of the images in the batch contains no objects.

Example (one image only, containing some object):

import torch
import PIL.Image

yolo = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)
yolo.to('cuda:0')
yolo = yolo.autoshape()

images = [
    PIL.Image.open('picture.jpg'),
]

det = yolo(images)
det.pred
[tensor([[1.87591e+02, 5.51864e+02, 3.39853e+02, 8.59066e+02, 8.55936e-01, 7.60000e+01],
         [5.86590e+02, 4.37439e+02, 7.19651e+02, 6.74310e+02, 6.16857e-01, 6.70000e+01],
         [5.87071e+02, 4.37369e+02, 7.17961e+02, 6.76576e+02, 4.29617e-01, 6.50000e+01],
         [7.52062e+02, 0.00000e+00, 9.28404e+02, 1.42560e+02, 3.03035e-01, 5.80000e+01],
         [1.04925e+03, 1.45816e+02, 1.15221e+03, 5.04700e+02, 2.93337e-01, 7.60000e+01],
         [9.32327e+02, 5.56952e+02, 1.24399e+03, 8.11386e+02, 2.89080e-01, 6.70000e+01],
         [9.30411e+02, 5.55609e+02, 1.25147e+03, 8.10758e+02, 2.72438e-01, 6.30000e+01]], device='cuda:0')]

Environment

  • pythorch 1.7.0
  • torchvision 0.8.1

Additional context

The bug happens because the method non_max_suppression called on line 166 of models/common.py. The method correctly returns an empty tensor if no objects are detected, however, the tensor is always placed on the CPU, regardless of the original placement:

Using the same two images as before and running in the debugger we can we print y just after the call to non_max_suppression. The first tensor, relative to the first image, is empty but is on the wrong device

[
    tensor([], size=(0, 6)), 
    tensor([[1.89654e+02, 5.47858e+02, 3.39484e+02, 8.56558e+02, 8.70220e-01, 7.60000e+01],
        [5.87016e+02, 4.40495e+02, 7.19600e+02, 6.72573e+02, 6.85066e-01, 6.70000e+01],
        [9.31165e+02, 5.56877e+02, 1.24205e+03, 8.10770e+02, 5.99013e-01, 6.70000e+01],
        [5.87003e+02, 4.37579e+02, 7.18493e+02, 6.75023e+02, 5.25025e-01, 6.50000e+01],
        [7.55756e+02, 0.00000e+00, 9.21378e+02, 1.41996e+02, 4.79300e-01, 5.80000e+01],
        [6.54120e+00, 2.76469e+02, 4.11797e+02, 5.17683e+02, 3.27896e-01, 7.30000e+01]], device='cuda:0')
]
@baldassarreFe baldassarreFe added the bug Something isn't working label Dec 6, 2020
@github-actions
Copy link
Contributor

github-actions bot commented Dec 6, 2020

Hello @baldassarreFe, thank you for your interest in 🚀 YOLOv5! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://www.ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.7. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

baldassarreFe added a commit to baldassarreFe/yolov5 that referenced this issue Dec 6, 2020
Workaround issue ultralytics#1617. Probably, the actual solution is to modify `non_max_suppression`.
@glenn-jocher
Copy link
Member

@baldassarreFe thanks for the bug report. I am able to reproduce this in a Colab notebook. It looks like the best solution would be to properly initialize the empty tensors on the same device as the incoming data in the NMS function. I will take a look.

@glenn-jocher glenn-jocher linked a pull request Dec 6, 2020 that will close this issue
@glenn-jocher
Copy link
Member

I verified it works now, problems solved! :)

Screenshot 2020-12-06 at 17 56 46

@baldassarreFe
Copy link
Author

Thanks for the quick fix!

@glenn-jocher
Copy link
Member

@baldassarreFe thanks for the feedback! If you see any other areas that need improvement please let us know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants