Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batched Inference slower then frame by frame #9987

Closed
1 task done
ahmadmustafaanis opened this issue Oct 31, 2022 · 3 comments
Closed
1 task done

Batched Inference slower then frame by frame #9987

ahmadmustafaanis opened this issue Oct 31, 2022 · 3 comments
Labels
question Further information is requested

Comments

@ahmadmustafaanis
Copy link
Contributor

Search before asking

Question

Hi, I have read that batch inference is always faster then frame by frame, but in my case I am getting opposite results.

Here is my code

import torch
import cv2
import time

frames = []
cap = cv2.VideoCapture("test.mp4")
for i in range(100):
    ret, frame = cap.read()
    if ret:
        frames.append(frame)
    else:
        break

model = torch.hub.load('ultralytics/yolov5', 'custom', path='model1.pt')  # custom model

start = time.time()
for frame in frames:
    results = model(frame, size=720)

end = time.time()
print("Time taken (Frame by Frame): ", end - start)


start = time.time()
results = model(frames, size=720)
end = time.time()
print("Time taken (Batch Prediction): ", end - start)

The output is

Time taken (Frame by Frame):  33.28466463088989
Time taken (Batch Prediction):  43.20599818229675

Which is very weird as batch predictions should always be fast.

Additional

No response

@ahmadmustafaanis ahmadmustafaanis added the question Further information is requested label Oct 31, 2022
@ahmadmustafaanis ahmadmustafaanis changed the title Batched Inference slower then frame by fra,e Batched Inference slower then frame by frame Oct 31, 2022
@glenn-jocher
Copy link
Member

glenn-jocher commented Oct 31, 2022

👋 Hello! Thanks for asking about inference speed issues. PyTorch Hub speeds will vary by hardware, software, model, inference settings, etc. Our default example in Colab with a V100 looks like this:

Screen Shot 2022-05-03 at 10 20 39 AM

YOLOv5 🚀 can be run on CPU (i.e. --device cpu, slow) or GPU if available (i.e. --device 0, faster). You can determine your inference device by viewing the YOLOv5 console output:

detect.py inference

python detect.py --weights yolov5s.pt --img 640 --conf 0.25 --source data/images/

Screen Shot 2022-05-03 at 2 48 42 PM

YOLOv5 PyTorch Hub inference

import torch

# Model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')

# Images
dir = 'https://ultralytics.com/images/'
imgs = [dir + f for f in ('zidane.jpg', 'bus.jpg')]  # batch of images

# Inference
results = model(imgs)
results.print()  # or .show(), .save()
# Speed: 631.5ms pre-process, 19.2ms inference, 1.6ms NMS per image at shape (2, 3, 640, 640)

Increase Speeds

If you would like to increase your inference speed some options are:

  • Use batched inference with YOLOv5 PyTorch Hub
  • Reduce --img-size, i.e. 1280 -> 640 -> 320
  • Reduce model size, i.e. YOLOv5x -> YOLOv5l -> YOLOv5m -> YOLOv5s -> YOLOv5n
  • Use half precision FP16 inference with python detect.py --half and python val.py --half
  • Use a faster GPUs, i.e.: P100 -> V100 -> A100
  • Export to ONNX or OpenVINO for up to 3x CPU speedup (CPU Benchmarks)
  • Export to TensorRT for up to 5x GPU speedup (GPU Benchmarks)
  • Use a free GPU backends with up to 16GB of CUDA memory: Open In Colab Open In Kaggle

Good luck 🍀 and let us know if you have any other questions!

@ahmadmustafaanis
Copy link
Contributor Author

Another question is that, in batched infetrence, can we specify a fixed batch size, regardless of the incoming batch size. So for example, if the list of images(batch) is of 100 images, I want it to automatically get processed in batches of 32, 32, 32, 4.

@glenn-jocher
Copy link
Member

@ahmadmustafaanis pytorch hub models run at the batch size you provide in your list of inputs, whether it's 1 or 100

Larger batch sizes should provide faster speeds per image (but of course slower overall). See YOLOv5 batch size study:
https://community.ultralytics.com/t/yolov5-study-batch-size-vs-speed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants