How to do object detection on video or streaming data #2045

Surayuth · 2021-01-26T16:31:53Z

❔Question

I'm new to this framework. Can someone guide me on how to do object detection on video and streaming data using yolov5. My problem is I want to detect objects from a video game in real-time(when I'm playing the game). Do I have to capture a screen every second and then pass the screen to the model?

Thanks.

Additional context

glenn-jocher · 2021-01-26T19:33:33Z

@Surayuth see README for inference examples on videos and streaming sources:
https://github.com/ultralytics/yolov5#inference

Inference

detect.py runs inference on a variety of sources, downloading models automatically from the latest YOLOv5 release and saving results to runs/detect.

$ python detect.py --source 0  # webcam
                            file.jpg  # image 
                            file.mp4  # video
                            path/  # directory
                            path/*.jpg  # glob
                            rtsp://170.93.143.139/rtplive/470011e600ef003a004ee33696235daa  # rtsp stream
                            rtmp://192.168.1.105/live/test  # rtmp stream
                            http://112.50.243.8/PLTV/88888888/224/3221225900/1.m3u8  # http stream

meadlai · 2021-11-06T15:39:55Z

@glenn-jocher, thank you for your guide. With the webcam as source, is there any way to show/display the detect result directly? And get the coordinate of the objects?

glenn-jocher · 2021-11-08T12:35:41Z

@meadlai python detect.py --source 0 displays the results in realtime already.

For returning results in a python environment you probably want to use a YOLOv5 PyTorch Hub model:

import torch

# Model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')

# Image
img = 'https://ultralytics.com/images/zidane.jpg'

# Inference
results = model(img)

results.pandas().xyxy[0]
#      xmin    ymin    xmax   ymax  confidence  class    name
# 0  749.50   43.50  1148.0  704.5    0.874023      0  person
# 1  433.50  433.50   517.5  714.5    0.687988     27     tie
# 2  114.75  195.75  1095.0  708.0    0.624512      0  person
# 3  986.00  304.00  1028.0  420.0    0.286865     27     tie

See PyTorch Hub tutorial for details:

YOLOv5 Tutorials

Train Custom Data 🚀 RECOMMENDED
Tips for Best Training Results ☘️ RECOMMENDED
Weights & Biases Logging 🌟 NEW
Supervisely Ecosystem 🌟 NEW
Multi-GPU Training
PyTorch Hub ⭐ NEW
TorchScript, ONNX, CoreML Export 🚀
Test-Time Augmentation (TTA)
Model Ensembling
Model Pruning/Sparsity
Hyperparameter Evolution
Transfer Learning with Frozen Layers ⭐ NEW
TensorRT Deployment

Neel7317 · 2022-03-30T06:27:48Z

@meadlai here is the code what you are looking for..

import torch
import numpy as np
import cv2
from time import time

class OD:

def __init__(self, capture_index, model_name):
    """
    Initializes the class with youtube url and output file.
    :param url: Has to be as youtube URL,on which prediction is made.
    :param out_file: A valid output file name.
    """
    self.capture_index = capture_index
    self.model = self.load_model(model_name)
    self.classes = self.model.names
    self.device = 'cuda' if torch.cuda.is_available() else 'cpu'
    print("Using Device: ", self.device)

def get_video_capture(self):
    """
    Creates a new video streaming object to extract video frame by frame to make prediction on.
    :return: opencv2 video capture object, with lowest quality frame available for video.
    """
  
    return cv2.VideoCapture(self.capture_index)

def load_model(self, model_name):
    """
    Loads Yolo5 model from pytorch hub.
    :return: Trained Pytorch model.
    """
    if model_name:
        model = torch.hub.load('ultralytics/yolov5', 'custom', path=model_name, force_reload=True)
    else:
        model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)
    return model

def score_frame(self, frame):
    """
    Takes a single frame as input, and scores the frame using yolo5 model.
    :param frame: input frame in numpy/list/tuple format.
    :return: Labels and Coordinates of objects detected by model in the frame.
    """
    self.model.to(self.device)
    frame = [frame]
    results = self.model(frame)
    labels, cord = results.xyxyn[0][:, -1], results.xyxyn[0][:, :-1]
    return labels, cord

def class_to_label(self, x):
    """
    For a given label value, return corresponding string label.
    :param x: numeric label
    :return: corresponding string label
    """
    return self.classes[int(x)]

def plot_boxes(self, results, frame):
    """
    Takes a frame and its results as input, and plots the bounding boxes and label on to the frame.
    :param results: contains labels and coordinates predicted by model on the given frame.
    :param frame: Frame which has been scored.
    :return: Frame with bounding boxes and labels ploted on it.
    """
    labels, cord = results
    n = len(labels)
    x_shape, y_shape = frame.shape[1], frame.shape[0]
    for i in range(n):
        row = cord[i]
        if row[4] >= 0.3:
            x1, y1, x2, y2 = int(row[0]*x_shape), int(row[1]*y_shape), int(row[2]*x_shape), int(row[3]*y_shape)
            bgr = (0, 255, 0)
            cv2.rectangle(frame, (x1, y1), (x2, y2), bgr, 2)
            cv2.putText(frame, self.class_to_label(labels[i]), (x1, y1), cv2.FONT_HERSHEY_SIMPLEX, 0.9, bgr, 2)

    return frame

def __call__(self):
    """
    This function is called when class is executed, it runs the loop to read the video frame by frame,
    and write the output into a new file.
    :return: void
    """
    cap = self.get_video_capture()
    assert cap.isOpened()
  
    while True:
      
        ret, frame = cap.read()
        assert ret
        
        frame = cv2.resize(frame, (640,640))
        
        start_time = time()
        results = self.score_frame(frame)
        frame = self.plot_boxes(results, frame)
        
        end_time = time()
        fps = 1/np.round(end_time - start_time, 2)
        #print(f"Frames Per Second : {fps}")
         
        cv2.putText(frame, f'FPS: {int(fps)}', (20,70), cv2.FONT_HERSHEY_SIMPLEX, 1.5, (0,255,0), 2)
        
        cv2.imshow('YOLOv5 Detection', frame)

        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
  
    cap.release()

Create a new object and execute.

detector = OD(capture_index=0, model_name='320_yolo_400.pt')
detector()

gustavozantut · 2022-09-08T17:10:56Z

https://github.com/gustavozantut/yolov5_live_results streaming output frames results in real time

wb-08 · 2022-09-10T14:17:25Z

import cv2
import torch
from PIL import Image
from mss import mss
import numpy as np

model = torch.hub.load("yolov5", 'custom', path="/yolov5/best.pt", source='local')

sct = mss()

while 1:
    w, h = 1920, 1080
    monitor = {'top': 0, 'left': 0, 'width': w, 'height': h}
    img = Image.frombytes('RGB', (w, h), sct.grab(monitor).rgb)
    screen = cv2.cvtColor(np.array(img), cv2.COLOR_RGB2BGR)
    # set the model use the screen
    result = model(screen, size=640)
    print(result)
    cv2.imshow('Screen', result.render()[0])

    if cv2.waitKey(25) & 0xFF == ord('q'):
        cv2.destroyAllWindows()
        break

glenn-jocher · 2022-09-10T16:53:05Z

@wb-08 how does mss() compare to our official screenshot example?

wb-08 · 2022-09-11T11:01:32Z

@glenn-jocher , but it doesn't work correctly for the video

glenn-jocher · 2022-09-11T12:11:40Z

@wb-08 yes the example is for an image, but that can be dropped into a for loop obviously and turned into a video.

wb-08 · 2022-09-11T12:19:16Z

@glenn-jocher , maybe into while loop?
In any case, you can try my solution and the official one and see that my solution works better:)

glenn-jocher · 2022-09-11T16:09:02Z

@wb-08 got it. I tried it out, mss seems to be much faster than the PIL image grab method. This is about 30 ms vs 400 ms for tutorial method:

import time

import cv2
import mss
import numpy

with mss.mss() as sct:
    # Part of the screen to capture
    monitor = sct.monitors[0]

    while "Screen capturing":
        last_time = time.time()

        # Get raw pixels from the screen, save it to a Numpy array
        img = numpy.array(sct.grab(monitor))

        # Display the picture
        cv2.imshow("OpenCV/Numpy normal", img)

        # Display the picture in grayscale
        # cv2.imshow('OpenCV/Numpy grayscale',
        #            cv2.cvtColor(img, cv2.COLOR_BGRA2GRAY))

        print("fps: {}".format(1 / (time.time() - last_time)))

        # Press "q" to quit
        if cv2.waitKey(25) & 0xFF == ord("q"):
            cv2.destroyAllWindows()
            break

glenn-jocher · 2022-09-11T16:18:22Z

@AyushExel this mss screenshot loader works really well, we should integrate into StreamLoader with detect.py reserved source name like this:

python detect.py --source screen

zombob · 2022-09-21T15:30:50Z

@AyushExel this mss screenshot loader works really well, we should integrate into StreamLoader with detect.py reserved source name like this:
python detect.py --source screen

may be better:
python detect.py --source screen top left width height

glenn-jocher · 2022-09-21T15:39:33Z

@zombob not a bad idea. We're super busy and haven't had time to work on this feature yet, but if you'd like to help with a PR that would be great!

zombob · 2022-09-22T07:33:55Z

@zombob not a bad idea. We're super busy and haven't had time to work on this feature yet, but if you'd like to help with a PR that would be great!

Fixed this , PR link: #9542
use it like:

python .\detect.py --source screen   # default full screen(0)
python .\detect.py --source "screen 2"   # 2nd screen only if you have multiple monitors, can specify screen number
python .\detect.py --source "screen 500 600 256 256"   # Specify top, left, width and height
python .\detect.py --source "screen 1 500 100 256 256"   # Specify screen nunmber, top, left, width and height

DABHIHARDIK · 2023-05-02T21:19:27Z

if I want to detection in youtube live stream video or RTMP server then what should i do ??

gustavozantut · 2023-05-02T21:23:02Z

if I want to detection in youtube live stream video or RTMP server then what should i do ??

Just specify "--source {link_here}" when calling detect.py. YouTube works, rtsp too , not sure about rtmp.

glenn-jocher · 2023-05-03T00:49:31Z

@gustavozantut yes, you can use the YOLOv5 detector to perform object detection on youtube live-streaming videos or RTMP servers. You just need to specify the source URL when calling detect.py using the --source flag, like the following example:

python detect.py --source "http://youtube.com/watch?v=abcdefghijk"

Note that you should specify the streaming link and not the webpage's link. For RTMP servers, I'm not completely sure, but you may be able to use the same command, just replace the source URL with your RTMP server URL.

fatmaboodai · 2023-11-09T21:50:55Z

abcdefghijk"

Hello,
Is it possible to have the detection happen in real time as an overlay within the chrome browser window? Like i can pass the screen recording to the model and the boxes frame as an overlay to the tab I'm viewing inside the chrome browser

glenn-jocher · 2023-11-09T23:38:41Z

@fatmaboodai yes, it is possible to achieve real-time object detection as an overlay within a Chrome browser window by using the YOLOv5 model. You can capture the screen recording and pass it to the model for object detection. Then, you can use browser-based technologies such as WebRTC, HTML5 canvas, or WebGL to overlay the detected boxes onto the tab you are viewing inside the Chrome browser. This would involve a combination of capturing the screen, processing the frames with YOLOv5, and rendering the results as an overlay in the browser window.

fatmaboodai · 2023-11-10T09:12:15Z

@fatmaboodai yes, it is possible to achieve real-time object detection as an overlay within a Chrome browser window by using the YOLOv5 model. You can capture the screen recording and pass it to the model for object detection. Then, you can use browser-based technologies such as WebRTC, HTML5 canvas, or WebGL to overlay the detected boxes onto the tab you are viewing inside the Chrome browser. This would involve a combination of capturing the screen, processing the frames with YOLOv5, and rendering the results as an overlay in the browser window.

Just another question:
I already have a model trained on a custom dataset that is yolov8 based can I achieve the same goal using it?

glenn-jocher · 2023-11-10T10:18:46Z

@fatmaboodai Yes, you can achieve the same goal using your custom YOLOv8-based model. You can capture the screen recording, pass it to your custom model for object detection, and then use browser-based technologies to overlay the detected boxes onto the Chrome browser window. The process would be similar to what I described earlier, but using your custom YOLOv8-based model for object detection instead of YOLOv5.

fatmaboodai · 2023-11-10T10:20:37Z

Thank you so much i really appreciate it🙏🏽🙏🏽

glenn-jocher · 2023-11-10T15:28:45Z

@fatmaboodai you're welcome! If you have any more questions or need further assistance, feel free to ask. Good luck with your project!

Surayuth added the question Further information is requested label Jan 26, 2021

Surayuth closed this as completed Jan 28, 2021

zombob mentioned this issue Sep 22, 2022

Add --source screen for screenshot inference #9542

Merged

jeffbass mentioned this issue Jul 27, 2023

Where are the pitfalls for adding a live stream zmq publisher service to imagenode? jeffbass/yin-yang-ranch#3

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to do object detection on video or streaming data #2045

How to do object detection on video or streaming data #2045

Surayuth commented Jan 26, 2021

glenn-jocher commented Jan 26, 2021

meadlai commented Nov 6, 2021

glenn-jocher commented Nov 8, 2021 •

edited

Loading

Neel7317 commented Mar 30, 2022

gustavozantut commented Sep 8, 2022

wb-08 commented Sep 10, 2022

glenn-jocher commented Sep 10, 2022 •

edited

Loading

wb-08 commented Sep 11, 2022

glenn-jocher commented Sep 11, 2022

wb-08 commented Sep 11, 2022

glenn-jocher commented Sep 11, 2022 •

edited

Loading

glenn-jocher commented Sep 11, 2022

zombob commented Sep 21, 2022

glenn-jocher commented Sep 21, 2022

zombob commented Sep 22, 2022

DABHIHARDIK commented May 2, 2023

gustavozantut commented May 2, 2023 •

edited

Loading

glenn-jocher commented May 3, 2023

fatmaboodai commented Nov 9, 2023

glenn-jocher commented Nov 9, 2023

fatmaboodai commented Nov 10, 2023

glenn-jocher commented Nov 10, 2023

fatmaboodai commented Nov 10, 2023

glenn-jocher commented Nov 10, 2023

How to do object detection on video or streaming data #2045

How to do object detection on video or streaming data #2045

Comments

Surayuth commented Jan 26, 2021

❔Question

Additional context

glenn-jocher commented Jan 26, 2021

Inference

meadlai commented Nov 6, 2021

glenn-jocher commented Nov 8, 2021 • edited Loading

YOLOv5 Tutorials

Neel7317 commented Mar 30, 2022

Create a new object and execute.

gustavozantut commented Sep 8, 2022

wb-08 commented Sep 10, 2022

glenn-jocher commented Sep 10, 2022 • edited Loading

wb-08 commented Sep 11, 2022

glenn-jocher commented Sep 11, 2022

wb-08 commented Sep 11, 2022

glenn-jocher commented Sep 11, 2022 • edited Loading

glenn-jocher commented Sep 11, 2022

zombob commented Sep 21, 2022

glenn-jocher commented Sep 21, 2022

zombob commented Sep 22, 2022

DABHIHARDIK commented May 2, 2023

gustavozantut commented May 2, 2023 • edited Loading

glenn-jocher commented May 3, 2023

fatmaboodai commented Nov 9, 2023

glenn-jocher commented Nov 9, 2023

fatmaboodai commented Nov 10, 2023

glenn-jocher commented Nov 10, 2023

fatmaboodai commented Nov 10, 2023

glenn-jocher commented Nov 10, 2023

glenn-jocher commented Nov 8, 2021 •

edited

Loading

glenn-jocher commented Sep 10, 2022 •

edited

Loading

glenn-jocher commented Sep 11, 2022 •

edited

Loading

gustavozantut commented May 2, 2023 •

edited

Loading