Device mismatch when an image in the input batch gives 0 detections at inference time #1617

baldassarreFe · 2020-12-06T16:01:49Z

🐛 Bug

The bug happens when:

yolo is loaded from torch hub (I haven't tried otherwise)
yolo is placed on a cuda device
autoshape is enabled
no objects are detected for one image of the batch

To Reproduce

Input:

import torch
import PIL.Image

yolo = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)
yolo.to('cuda:0')
yolo = yolo.autoshape()

images = [
    PIL.Image.new('RGB', (640, 480)), # empty picture
    PIL.Image.open('picture.jpg'),
]

det = yolo(images)

Output:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-5-e00d4d91b5be> in <module>
     11 ]
     12 
---> 13 det = yolo(images)

~/miniconda3/envs/wstal/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

~/.cache/torch/hub/ultralytics_yolov5_master/models/common.py in forward(self, imgs, size, augment, profile)
    171                 y[i][:, :4] = scale_coords(shape1, y[i][:, :4], shape0[i])
    172 
--> 173         return Detections(imgs, y, self.names)
    174 
    175 

~/.cache/torch/hub/ultralytics_yolov5_master/models/common.py in __init__(self, imgs, pred, names)
    185         d = pred[0].device  # device
    186         gn = [torch.tensor([*[im.shape[i] for i in [1, 0, 1, 0]], 1., 1.], device=d) for im in imgs]  # normalizations
--> 187         self.xyxyn = [x / g for x, g in zip(self.xyxy, gn)]  # xyxy normalized
    188         self.xywhn = [x / g for x, g in zip(self.xywh, gn)]  # xywh normalized
    189         self.n = len(self.pred)

~/.cache/torch/hub/ultralytics_yolov5_master/models/common.py in <listcomp>(.0)
    185         d = pred[0].device  # device
    186         gn = [torch.tensor([*[im.shape[i] for i in [1, 0, 1, 0]], 1., 1.], device=d) for im in imgs]  # normalizations
--> 187         self.xyxyn = [x / g for x, g in zip(self.xyxy, gn)]  # xyxy normalized
    188         self.xywhn = [x / g for x, g in zip(self.xywh, gn)]  # xywh normalized
    189         self.n = len(self.pred)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Expected behavior

Running inference on a batch should not cause an error if one of the images in the batch contains no objects.

Example (one image only, containing some object):

import torch
import PIL.Image

yolo = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)
yolo.to('cuda:0')
yolo = yolo.autoshape()

images = [
    PIL.Image.open('picture.jpg'),
]

det = yolo(images)
det.pred

[tensor([[1.87591e+02, 5.51864e+02, 3.39853e+02, 8.59066e+02, 8.55936e-01, 7.60000e+01],
         [5.86590e+02, 4.37439e+02, 7.19651e+02, 6.74310e+02, 6.16857e-01, 6.70000e+01],
         [5.87071e+02, 4.37369e+02, 7.17961e+02, 6.76576e+02, 4.29617e-01, 6.50000e+01],
         [7.52062e+02, 0.00000e+00, 9.28404e+02, 1.42560e+02, 3.03035e-01, 5.80000e+01],
         [1.04925e+03, 1.45816e+02, 1.15221e+03, 5.04700e+02, 2.93337e-01, 7.60000e+01],
         [9.32327e+02, 5.56952e+02, 1.24399e+03, 8.11386e+02, 2.89080e-01, 6.70000e+01],
         [9.30411e+02, 5.55609e+02, 1.25147e+03, 8.10758e+02, 2.72438e-01, 6.30000e+01]], device='cuda:0')]

Environment

pythorch 1.7.0
torchvision 0.8.1

Additional context

The bug happens because the method non_max_suppression called on line 166 of models/common.py. The method correctly returns an empty tensor if no objects are detected, however, the tensor is always placed on the CPU, regardless of the original placement:

Using the same two images as before and running in the debugger we can we print y just after the call to non_max_suppression. The first tensor, relative to the first image, is empty but is on the wrong device

[
    tensor([], size=(0, 6)), 
    tensor([[1.89654e+02, 5.47858e+02, 3.39484e+02, 8.56558e+02, 8.70220e-01, 7.60000e+01],
        [5.87016e+02, 4.40495e+02, 7.19600e+02, 6.72573e+02, 6.85066e-01, 6.70000e+01],
        [9.31165e+02, 5.56877e+02, 1.24205e+03, 8.10770e+02, 5.99013e-01, 6.70000e+01],
        [5.87003e+02, 4.37579e+02, 7.18493e+02, 6.75023e+02, 5.25025e-01, 6.50000e+01],
        [7.55756e+02, 0.00000e+00, 9.21378e+02, 1.41996e+02, 4.79300e-01, 5.80000e+01],
        [6.54120e+00, 2.76469e+02, 4.11797e+02, 5.17683e+02, 3.27896e-01, 7.30000e+01]], device='cuda:0')
]

The text was updated successfully, but these errors were encountered:

github-actions · 2020-12-06T16:02:34Z

Hello @baldassarreFe, thank you for your interest in 🚀 YOLOv5! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://www.ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.7. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Google Colab Notebook with free GPU:
Kaggle Notebook with free GPU: https://www.kaggle.com/ultralytics/yolov5
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Docker Image https://hub.docker.com/r/ultralytics/yolov5. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

Workaround issue ultralytics#1617. Probably, the actual solution is to modify `non_max_suppression`.

glenn-jocher · 2020-12-06T16:23:40Z

@baldassarreFe thanks for the bug report. I am able to reproduce this in a Colab notebook. It looks like the best solution would be to properly initialize the empty tensors on the same device as the incoming data in the NMS function. I will take a look.

glenn-jocher · 2020-12-06T16:57:30Z

I verified it works now, problems solved! :)

baldassarreFe · 2020-12-06T17:07:32Z

Thanks for the quick fix!

glenn-jocher · 2020-12-06T17:09:47Z

@baldassarreFe thanks for the feedback! If you see any other areas that need improvement please let us know.

baldassarreFe added the bug Something isn't working label Dec 6, 2020

baldassarreFe added a commit to baldassarreFe/yolov5 that referenced this issue Dec 6, 2020

Workaround ultralytics#1617

5b42c3d

Workaround issue ultralytics#1617. Probably, the actual solution is to modify `non_max_suppression`.

glenn-jocher mentioned this issue Dec 6, 2020

Hub device mismatch bug fix #1619

Merged

glenn-jocher linked a pull request Dec 6, 2020 that will close this issue

Hub device mismatch bug fix #1619

Merged

glenn-jocher closed this as completed in #1619 Dec 6, 2020

anstadnik mentioned this issue Mar 9, 2023

I get the same error as in #1617 on MPS #11136

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Device mismatch when an image in the input batch gives 0 detections at inference time #1617

Device mismatch when an image in the input batch gives 0 detections at inference time #1617

baldassarreFe commented Dec 6, 2020

github-actions bot commented Dec 6, 2020 •

edited by glenn-jocher

Loading

glenn-jocher commented Dec 6, 2020

glenn-jocher commented Dec 6, 2020

baldassarreFe commented Dec 6, 2020

glenn-jocher commented Dec 6, 2020

Device mismatch when an image in the input batch gives 0 detections at inference time #1617

Device mismatch when an image in the input batch gives 0 detections at inference time #1617

Comments

baldassarreFe commented Dec 6, 2020

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

github-actions bot commented Dec 6, 2020 • edited by glenn-jocher Loading

Requirements

Environments

Status

glenn-jocher commented Dec 6, 2020

glenn-jocher commented Dec 6, 2020

baldassarreFe commented Dec 6, 2020

glenn-jocher commented Dec 6, 2020

github-actions bot commented Dec 6, 2020 •

edited by glenn-jocher

Loading