Slow inference on GPU (depending on warmup & nms) #11259

mueto · 2023-03-28T12:29:23Z

mueto
Mar 28, 2023

Hi,

I'm usign a Nvidia RTX A2000 for inference.

Every second one images is classified. (always the same image with only a single object)

I measerd the time of the detect.py:

After 5 seconds (5 images) the execution time is stable at 95ms.

When i delete the following code the execution after 5s is stable at 55ms:
model.warmup(imgsz=(1 if pt else bs, 3, *imgsz))

When i directly return a empty result in the non_max_suppression function (without executing the rest of the non_max_suppression function), the execution after 5s is stable at 10ms:
return [torch.zeros((0, 6), device=prediction.device)]

Why does the warmup function and the non_max_suppression function take so long?

mueto · 2023-03-30T10:15:46Z

mueto
Mar 30, 2023
Author

Update:

To reproduce the behavior the following steps are necessary

Adding delay after every inference cylce
After:
# Print time (inference-only)
LOGGER.info(f"{s}{'' if len(det) else '(no detections), '}{dt[1].dt * 1E3:.1f}ms")
Add a delay of 1 second:
time.sleep(1)
Execute 'detect.py' with a folder of pictures:
python detect.py --weights test.pt --source images/ device 0

Result
image 1/12 C:\yolov5-master\images\000000000034_OR.jpg: 640x640 1 testcls, 8.0ms
image 2/12 C:\yolov5-master\images\000000000035_OR.jpg: 640x640 1 testcls, 8.0ms
image 3/12 C:\yolov5-master\images\000000000036_OR.jpg: 640x640 1 testcls, 82.7ms
image 4/12 C:\yolov5-master\images\000000000038_OR.jpg: 640x640 1 testcls, 88.7ms
image 5/12 C:\yolov5-master\images\000000000039_OR.jpg: 640x640 1 testcls, 84.7ms
image 6/12 C:\yolov5-master\images\000000000040_OR.jpg: 640x640 1 testcls, 84.7ms
image 7/12 C:\yolov5-master\images\000000000041_OR.jpg: 640x640 1 testcls, 84.7ms
image 8/12 C:\yolov5-master\images\000000000043_OR.jpg: 640x640 1 testcls, 89.7ms
image 9/12 C:\yolov5-master\images\000000000044_OR.jpg: 640x640 1 testcls, 83.7ms
image 10/12 C:\yolov5-master\images\000000000045_OR.jpg: 640x640 1 testcls, 85.7ms
image 11/12 C:\yolov5-master\images\000000000046_OR.jpg: 640x640 1 testcls, 84.7ms
image 12/12 C:\yolov5-master\images\000000000047_OR.jpg: 640x640 1 testcls, 84.7ms
Speed: 2.7ms pre-process, 72.5ms inference, 1.8ms NMS per image at shape (1, 3, 640, 640)

Environment:
Python 3.8
CUDA 11.6
cuDNN 8.3.2
Windows 10

Without the delay of 1 second i get a constant inference time of 8ms.
With the delay of 1 second i get a inference time of 8ms for the first two images, after that, the inference time changes to 84ms.

It seems like my GPU is going into sleep mode.

How can i solve this problem?
Thanks

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow inference on GPU (depending on warmup & nms) #11259

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Slow inference on GPU (depending on warmup & nms) #11259

mueto Mar 28, 2023

Replies: 1 comment

mueto Mar 30, 2023 Author

mueto
Mar 28, 2023

mueto
Mar 30, 2023
Author