Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to increase FPS camera capture inside the Raspberry Pi 4B 8GB with best.onnx model #13144

Open
2 tasks done
Killuagg opened this issue Jun 27, 2024 · 10 comments
Open
2 tasks done
Labels
bug Something isn't working

Comments

@Killuagg
Copy link

Search before asking

  • I have searched the YOLOv5 issues and found no similar bug report.

YOLOv5 Component

Detection

Bug

Hi, i am currently trying to make traffic sign detection and recognition by using the YOLOv5 Pytorch with Yolov5s model. I am using detect.py file to run the model and the FPS i get is only 1 FPS. The dataset contain around 2K images with 200 epochs. I run the code with:
python detect.py --weights best.onnx --img 640 --conf 0.7 --source 0

Is there any modify to the code so that i can get more than 4FPS?

Environment

-Raspberry Pi 4B with 8GB Ram
-Webcam
-Model best.onnx
-Train using Yolov5 Pytorch

Minimal Reproducible Example

No response

Additional

No response

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!
@Killuagg Killuagg added the bug Something isn't working label Jun 27, 2024
Copy link
Contributor

👋 Hello @Killuagg, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Requirements

Python>=3.8.0 with all requirements.txt installed including PyTorch>=1.8. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

YOLOv5 CI

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

Introducing YOLOv8 🚀

We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 🚀!

Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects.

Check out our YOLOv8 Docs for details and get started with:

pip install ultralytics

@glenn-jocher
Copy link
Member

@Killuagg hi there,

Thank you for reaching out and for providing details about your setup and issue. To help you increase the FPS for your camera capture on the Raspberry Pi 4B, here are a few suggestions:

  1. Verify Latest Versions: Ensure you are using the latest versions of torch and the YOLOv5 repository. This can sometimes resolve performance issues due to optimizations and bug fixes in newer releases.

  2. Optimize Model Inference:

    • Use TensorRT: TensorRT can significantly improve inference speed on devices like the Raspberry Pi. You can convert your ONNX model to TensorRT. Here's a brief guide:
      sudo apt-get install -y libopenblas-base libopenmpi-dev
      wget https://github.com/ultralytics/yolov5/releases/download/v6.1/yolov5s.pt -O yolov5s.pt
      python3 export.py --weights yolov5s.pt --img 640 --batch 1 --device 0 --include engine
      This will generate a TensorRT engine file which you can use for inference.
  3. Reduce Image Size: Lowering the image size can help increase FPS. You can try reducing the --img parameter to 320 or even lower, depending on your accuracy requirements:

    python detect.py --weights best.onnx --img 320 --conf 0.7 --source 0
  4. Use a More Efficient Model: If you are using yolov5s, you might want to try yolov5n (nano), which is designed to be more lightweight and faster, though with a potential trade-off in accuracy:

    python detect.py --weights yolov5n.onnx --img 640 --conf 0.7 --source 0
  5. Optimize Code: Ensure that your code is optimized for performance. For example, make sure that the webcam capture and model inference are not blocking each other. You can use threading to handle webcam capture and inference in parallel.

  6. Hardware Acceleration: Ensure that you are utilizing hardware acceleration available on the Raspberry Pi. This includes enabling OpenCV with hardware acceleration and using appropriate libraries that leverage the GPU.

If you continue to experience issues, please provide a minimal reproducible example of your code. This will help us investigate further. You can find more details on creating a minimal reproducible example here.

Feel free to reach out if you have any more questions or need further assistance. The YOLO community and the Ultralytics team are always here to help! 😊

@Killuagg
Copy link
Author

Thank for your replied. First when i try to run the detect.py with img 320 the error produce : expected 620 not 320 size. So i only can run the 640 inside my raspberry pi. If i want to run the TensorRT model inside my raspberry pi, do i need to run it on GPU raspberry pi because device available is CPU only. Is there any code inside detect.py that make my fps have limit?

@glenn-jocher
Copy link
Member

Hi @Killuagg,

Thank you for your follow-up and for providing additional details. Let's address your concerns one by one.

Image Size Error

The error you encountered (expected 620 not 320 size) suggests that the model expects a specific input size. To resolve this, you can modify the model's input size to match your desired dimensions. However, if you're constrained to using 640 due to model requirements, let's focus on optimizing other aspects.

TensorRT on Raspberry Pi

Running TensorRT on a Raspberry Pi can indeed provide significant performance improvements, but it typically requires a GPU. Since the Raspberry Pi 4B primarily relies on its CPU, you might not see the same benefits as on a GPU-enabled device. However, you can still try optimizing your setup:

  1. Install TensorRT: You can install TensorRT on your Raspberry Pi, but note that the performance gains might be limited due to the lack of a dedicated GPU.

  2. Optimize Inference Code: Ensure that your inference code is as efficient as possible. For example, you can use threading to handle webcam capture and model inference in parallel, reducing any potential bottlenecks.

Code Example for Threading

Here's an example of how you might use threading to improve performance:

import cv2
import threading
import time
from yolov5 import YOLOv5

# Load model
model = YOLOv5("best.onnx")

# Function to capture frames
def capture_frames():
    global frame
    cap = cv2.VideoCapture(0)
    while True:
        ret, frame = cap.read()
        if not ret:
            break
        time.sleep(0.01)  # Adjust sleep time as needed

# Function to run inference
def run_inference():
    global frame
    while True:
        if frame is not None:
            results = model.predict(frame)
            # Process results
            time.sleep(0.01)  # Adjust sleep time as needed

# Start threads
frame = None
thread1 = threading.Thread(target=capture_frames)
thread2 = threading.Thread(target=run_inference)
thread1.start()
thread2.start()
thread1.join()
thread2.join()

Verify Latest Versions

Please ensure you are using the latest versions of torch and the YOLOv5 repository. This can sometimes resolve performance issues due to optimizations and bug fixes in newer releases.

Minimum Reproducible Example

If you continue to experience issues, please provide a minimal reproducible example of your code. This will help us investigate further. You can find more details on creating a minimal reproducible example here.

Feel free to reach out if you have any more questions or need further assistance. The YOLO community and the Ultralytics team are always here to help! 😊

@Killuagg
Copy link
Author

Killuagg commented Jun 28, 2024

Thank you for sharing info. May i know another method without using the TensorRT lite. I mean, its possible the solution only involving the CPU not GPU. Sorry for asking. Plus, may i know if 2000 images for train will effect the FPS?. Because i have other model with 800 images and the FPS still the same.

Why after i run the detect.py using source 0 which is webcam, the file mp4 cannot play on my raspberry pi and also window 11?

@glenn-jocher
Copy link
Member

Hi @Killuagg,

Thank you for your detailed follow-up! Let's address your questions and concerns step by step.

CPU-Only Optimization

If you're looking to optimize your YOLOv5 model inference on a CPU-only setup, here are a few strategies you can employ:

  1. Model Quantization: Quantizing your model can significantly improve inference speed by reducing the precision of the weights and activations. You can use tools like PyTorch's built-in quantization:

    import torch
    from torch.quantization import quantize_dynamic
    
    model = torch.load('best.pt')
    quantized_model = quantize_dynamic(model, {torch.nn.Linear}, dtype=torch.qint8)
    torch.save(quantized_model, 'best_quantized.pt')
  2. Use a Smaller Model: If you're currently using yolov5s, consider switching to yolov5n (nano), which is designed to be more lightweight and faster:

    python detect.py --weights yolov5n.pt --img 640 --conf 0.7 --source 0
  3. Optimize Code Execution: Ensure that your code is optimized for performance. For example, using threading to handle webcam capture and model inference in parallel can help reduce bottlenecks.

Dataset Size Impact

The number of images used for training (2000 vs. 800) does not directly affect the FPS during inference. The FPS is influenced by the model size, input image size, and the computational power of your device. However, a larger dataset can improve the model's accuracy, which might indirectly affect the processing time if the model becomes more complex.

Video Playback Issues

Regarding the issue with the MP4 file not playing on your Raspberry Pi and Windows 11, it could be related to the codec or the way the video is being saved. Ensure that the video is saved using a widely supported codec like H.264. Here’s an example of how to save the video correctly:

import cv2

# Define the codec and create VideoWriter object
fourcc = cv2.VideoWriter_fourcc(*'mp4v')  # Use 'XVID' for .avi files
out = cv2.VideoWriter('output.mp4', fourcc, 20.0, (640, 480))

while cap.isOpened():
    ret, frame = cap.read()
    if ret:
        # Write the frame
        out.write(frame)
    else:
        break

# Release everything if job is finished
cap.release()
out.release()
cv2.destroyAllWindows()

Minimum Reproducible Example

To help us better understand and resolve your issue, could you please provide a minimal reproducible example of your code? This will allow us to reproduce the bug and investigate a solution. You can find more details on creating a minimal reproducible example here. This step is crucial for us to provide accurate and effective support.

Verify Latest Versions

Lastly, please ensure you are using the latest versions of torch and the YOLOv5 repository. This can sometimes resolve performance issues due to optimizations and bug fixes in newer releases.

Feel free to reach out if you have any more questions or need further assistance. The YOLO community and the Ultralytics team are always here to help! 😊

@Killuagg
Copy link
Author

Ultralytics YOLOv5 🚀, AGPL-3.0 license

"""
Run YOLOv5 detection inference on images, videos, directories, globs, YouTube, webcam, streams, etc.

Usage - sources:
$ python detect.py --weights yolov5s.pt --source 0 # webcam
img.jpg # image
vid.mp4 # video
screen # screenshot
path/ # directory
list.txt # list of images
list.streams # list of streams
'path/*.jpg' # glob
'https://youtu.be/LNwODJXcvt4' # YouTube
'rtsp://example.com/media.mp4' # RTSP, RTMP, HTTP stream

Usage - formats:
$ python detect.py --weights yolov5s.pt # PyTorch
yolov5s.torchscript # TorchScript
yolov5s.onnx # ONNX Runtime or OpenCV DNN with --dnn
yolov5s_openvino_model # OpenVINO
yolov5s.engine # TensorRT
yolov5s.mlmodel # CoreML (macOS-only)
yolov5s_saved_model # TensorFlow SavedModel
yolov5s.pb # TensorFlow GraphDef
yolov5s.tflite # TensorFlow Lite
yolov5s_edgetpu.tflite # TensorFlow Edge TPU
yolov5s_paddle_model # PaddlePaddle
"""

import argparse
import csv
import os
import platform
import sys
from pathlib import Path

import torch
import time

import pyttsx3

Initialize the TTS engine

engine = pyttsx3.init()

FILE = Path(file).resolve()
ROOT = FILE.parents[0] # YOLOv5 root directory
if str(ROOT) not in sys.path:
sys.path.append(str(ROOT)) # add ROOT to PATH
ROOT = Path(os.path.relpath(ROOT, Path.cwd())) # relative

from ultralytics.utils.plotting import Annotator, colors, save_one_box

from models.common import DetectMultiBackend
from utils.dataloaders import IMG_FORMATS, VID_FORMATS, LoadImages, LoadScreenshots, LoadStreams
from utils.general import (
LOGGER,
Profile,
check_file,
check_img_size,
check_imshow,
check_requirements,
colorstr,
cv2,
increment_path,
non_max_suppression,
print_args,
scale_boxes,
strip_optimizer,
xyxy2xywh,
)
from utils.torch_utils import select_device, smart_inference_mode

@smart_inference_mode()
def run(
weights=ROOT / "best.onnx", # model path or triton URL
source=ROOT / "Data/images", # file/dir/URL/glob/screen/0(webcam)
data=ROOT / "data.yaml", # dataset.yaml path
imgsz=(640, 640), # inference size (height, width)
conf_thres=0.25, # confidence threshold
iou_thres=0.45, # NMS IOU threshold
max_det=1000, # maximum detections per image
device="", # cuda device, i.e. 0 or 0,1,2,3 or cpu
view_img=False, # show results
save_txt=False, # save results to *.txt
save_csv=False, # save results in CSV format
save_conf=False, # save confidences in --save-txt labels
save_crop=False, # save cropped prediction boxes
nosave=False, # do not save images/videos
classes=None, # filter by class: --class 0, or --class 0 2 3
agnostic_nms=False, # class-agnostic NMS
augment=False, # augmented inference
visualize=False, # visualize features
update=False, # update all models
project=ROOT / "runs/detect", # save results to project/name
name="exp", # save results to project/name
exist_ok=False, # existing project/name ok, do not increment
line_thickness=3, # bounding box thickness (pixels)
hide_labels=False, # hide labels
hide_conf=False, # hide confidences
half=False, # use FP16 half-precision inference
dnn=False, # use OpenCV DNN for ONNX inference
vid_stride=1, # video frame-rate stride
):
source = str(source)
save_img = not nosave and not source.endswith(".txt") # save inference images
is_file = Path(source).suffix[1:] in (IMG_FORMATS + VID_FORMATS)
is_url = source.lower().startswith(("rtsp://", "rtmp://", "http://", "https://"))
webcam = source.isnumeric() or source.endswith(".streams") or (is_url and not is_file)
screenshot = source.lower().startswith("screen")
if is_url and is_file:
source = check_file(source) # download

# Directories
save_dir = increment_path(Path(project) / name, exist_ok=exist_ok)  # increment run
(save_dir / "labels" if save_txt else save_dir).mkdir(parents=True, exist_ok=True)  # make dir

# Load model
device = select_device(device)
model = DetectMultiBackend(weights, device=device, dnn=dnn, data=data, fp16=half)
stride, names, pt = model.stride, model.names, model.pt
imgsz = check_img_size(imgsz, s=stride)  # check image size

# Dataloader
bs = 1  # batch_size
if webcam:
    view_img = check_imshow(warn=True)
    dataset = LoadStreams(source, img_size=imgsz, stride=stride, auto=pt, vid_stride=vid_stride)
    bs = len(dataset)
elif screenshot:
    dataset = LoadScreenshots(source, img_size=imgsz, stride=stride, auto=pt)
else:
    dataset = LoadImages(source, img_size=imgsz, stride=stride, auto=pt, vid_stride=vid_stride)
vid_path, vid_writer = [None] * bs, [None] * bs

# FPS calculation
prev_time = time.time()

# Run inference
model.warmup(imgsz=(1 if pt or model.triton else bs, 3, *imgsz))  # warmup
seen, windows, dt = 0, [], (Profile(device=device), Profile(device=device), Profile(device=device))
for path, im, im0s, vid_cap, s in dataset:
    current_time = time.time()
    fps = 1 / (current_time - prev_time)
    prev_time = current_time
    
    with dt[0]:
        im = torch.from_numpy(im).to(model.device)
        im = im.half() if model.fp16 else im.float()  # uint8 to fp16/32
        im /= 255  # 0 - 255 to 0.0 - 1.0
        if len(im.shape) == 3:
            im = im[None]  # expand for batch dim
        if model.xml and im.shape[0] > 1:
            ims = torch.chunk(im, im.shape[0], 0)

    # Inference
    with dt[1]:
        visualize = increment_path(save_dir / Path(path).stem, mkdir=True) if visualize else False
        if model.xml and im.shape[0] > 1:
            pred = None
            for image in ims:
                if pred is None:
                    pred = model(image, augment=augment, visualize=visualize).unsqueeze(0)
                else:
                    pred = torch.cat((pred, model(image, augment=augment, visualize=visualize).unsqueeze(0)), dim=0)
            pred = [pred, None]
        else:
            pred = model(im, augment=augment, visualize=visualize)
    # NMS
    with dt[2]:
        pred = non_max_suppression(pred, conf_thres, iou_thres, classes, agnostic_nms, max_det=max_det)

    # Second-stage classifier (optional)
    # pred = utils.general.apply_classifier(pred, classifier_model, im, im0s)

    # Define the path for the CSV file
    csv_path = save_dir / "predictions.csv"

    # Create or append to the CSV file
    def write_to_csv(image_name, prediction, confidence):
        """Writes prediction data for an image to a CSV file, appending if the file exists."""
        data = {"Image Name": image_name, "Prediction": prediction, "Confidence": confidence}
        with open(csv_path, mode="a", newline="") as f:
            writer = csv.DictWriter(f, fieldnames=data.keys())
            if not csv_path.is_file():
                writer.writeheader()
            writer.writerow(data)

    # Process predictions
    for i, det in enumerate(pred):  # per image
        seen += 1
        if webcam:  # batch_size >= 1
            p, im0, frame = path[i], im0s[i].copy(), dataset.count
            s += f"{i}: "
        else:
            p, im0, frame = path, im0s.copy(), getattr(dataset, "frame", 0)

        p = Path(p)  # to Path
        save_path = str(save_dir / p.name)  # im.jpg
        txt_path = str(save_dir / "labels" / p.stem) + ("" if dataset.mode == "image" else f"_{frame}")  # im.txt
        s += "%gx%g " % im.shape[2:]  # print string
        gn = torch.tensor(im0.shape)[[1, 0, 1, 0]]  # normalization gain whwh
        imc = im0.copy() if save_crop else im0  # for save_crop
        annotator = Annotator(im0, line_width=line_thickness, example=str(names))
        if len(det):
            # Rescale boxes from img_size to im0 size
            det[:, :4] = scale_boxes(im.shape[2:], det[:, :4], im0.shape).round()

            # Print results
            for c in det[:, 5].unique():
                n = (det[:, 5] == c).sum()  # detections per class
                s += f"{n} {names[int(c)]}{'s' * (n > 1)}, "  # add to string

            # Write results
            for *xyxy, conf, cls in reversed(det):
                c = int(cls)  # integer class
                label = names[c] if hide_conf else f"{names[c]}"
                confidence = float(conf)
                confidence_str = f"{confidence:.2f}"

                if save_csv:
                    write_to_csv(p.name, label, confidence_str)

                if save_txt:  # Write to file
                    xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist()  # normalized xywh
                    line = (cls, *xywh, conf) if save_conf else (cls, *xywh)  # label format
                    with open(f"{txt_path}.txt", "a") as f:
                        f.write(("%g " * len(line)).rstrip() % line + "\n")

                if save_img or save_crop or view_img:  # Add bbox to image
                    c = int(cls)  # integer class
                    label = None if hide_labels else (names[c] if hide_conf else f"{names[c]} {conf:.2f}")
                    annotator.box_label(xyxy, label, color=colors(c, True))
                if save_crop:
                    save_one_box(xyxy, imc, file=save_dir / "crops" / names[c] / f"{p.stem}.jpg", BGR=True)

        # Overlay FPS on the frame
        cv2.putText(im0, f"FPS: {fps:.2f}", (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2, cv2.LINE_AA)

        # Stream results
        im0 = annotator.result()
        if view_img:
            if platform.system() == "Linux" and p not in windows:
                windows.append(p)
                cv2.namedWindow(str(p), cv2.WINDOW_NORMAL | cv2.WINDOW_KEEPRATIO)  # allow window resize (Linux)
                cv2.resizeWindow(str(p), im0.shape[1], im0.shape[0])
            cv2.imshow(str(p), im0)
            cv2.waitKey(1)  # 1 millisecond

        # Save results (image with detections)
        if save_img:
            if dataset.mode == "image":
                cv2.imwrite(save_path, im0)
            else:  # 'video' or 'stream'
                if vid_path[i] != save_path:  # new video
                    vid_path[i] = save_path
                    if isinstance(vid_writer[i], cv2.VideoWriter):
                        vid_writer[i].release()  # release previous video writer
                    if vid_cap:  # video
                        fps = vid_cap.get(cv2.CAP_PROP_FPS)
                        w = int(vid_cap.get(cv2.CAP_PROP_FRAME_WIDTH))
                        h = int(vid_cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
                    else:  # stream
                        fps, w, h = 30, im0.shape[1], im0.shape[0]
                    save_path = str(Path(save_path).with_suffix(".mp4"))  # force *.mp4 suffix on results videos
                    vid_writer[i] = cv2.VideoWriter(save_path, cv2.VideoWriter_fourcc(*"mp4v"), fps, (w, h))
                vid_writer[i].write(im0)

    # Print time (inference-only)
    LOGGER.info(f"{s}{'' if len(det) else '(no detections), '}{dt[1].dt * 1E3:.1f}ms")

    detections = []
    for *xyxy, conf, cls in reversed(det):
        detections.append({'label': names[int(cls)]})

    # Assuming 'detections' is your list of detected objects
    for det in detections:
        # Extract the label of the detected object
        label = det['label']
        print(f"Detected: {label}")  # Debugging print statement
        # Generate voice feedback
        engine.say(f"Detected {label}")
        engine.runAndWait()
        
# Print results
t = tuple(x.t / seen * 1e3 for x in dt)  # speeds per image
LOGGER.info(f"Speed: %.1fms pre-process, %.1fms inference, %.1fms NMS per image at shape {(1, 3, *imgsz)}" % t)
if save_txt or save_img:
    s = f"\n{len(list(save_dir.glob('labels/*.txt')))} labels saved to {save_dir / 'labels'}" if save_txt else ""
    LOGGER.info(f"Results saved to {colorstr('bold', save_dir)}{s}")
if update:
    strip_optimizer(weights[0])  # update model (to fix SourceChangeWarning)

def parse_opt():
"""Parses command-line arguments for YOLOv5 detection, setting inference options and model configurations."""
parser = argparse.ArgumentParser()
parser.add_argument("--weights", nargs="+", type=str, default=ROOT / "yolov5s.pt", help="model path or triton URL")
parser.add_argument("--source", type=str, default=ROOT / "data/images", help="file/dir/URL/glob/screen/0(webcam)")
parser.add_argument("--data", type=str, default=ROOT / "data/coco128.yaml", help="(optional) dataset.yaml path")
parser.add_argument("--imgsz", "--img", "--img-size", nargs="+", type=int, default=[640], help="inference size h,w")
parser.add_argument("--conf-thres", type=float, default=0.25, help="confidence threshold")
parser.add_argument("--iou-thres", type=float, default=0.45, help="NMS IoU threshold")
parser.add_argument("--max-det", type=int, default=1000, help="maximum detections per image")
parser.add_argument("--device", default="", help="cuda device, i.e. 0 or 0,1,2,3 or cpu")
parser.add_argument("--view-img", action="store_true", help="show results")
parser.add_argument("--save-txt", action="store_true", help="save results to *.txt")
parser.add_argument("--save-csv", action="store_true", help="save results in CSV format")
parser.add_argument("--save-conf", action="store_true", help="save confidences in --save-txt labels")
parser.add_argument("--save-crop", action="store_true", help="save cropped prediction boxes")
parser.add_argument("--nosave", action="store_true", help="do not save images/videos")
parser.add_argument("--classes", nargs="+", type=int, help="filter by class: --classes 0, or --classes 0 2 3")
parser.add_argument("--agnostic-nms", action="store_true", help="class-agnostic NMS")
parser.add_argument("--augment", action="store_true", help="augmented inference")
parser.add_argument("--visualize", action="store_true", help="visualize features")
parser.add_argument("--update", action="store_true", help="update all models")
parser.add_argument("--project", default=ROOT / "runs/detect", help="save results to project/name")
parser.add_argument("--name", default="exp", help="save results to project/name")
parser.add_argument("--exist-ok", action="store_true", help="existing project/name ok, do not increment")
parser.add_argument("--line-thickness", default=3, type=int, help="bounding box thickness (pixels)")
parser.add_argument("--hide-labels", default=False, action="store_true", help="hide labels")
parser.add_argument("--hide-conf", default=False, action="store_true", help="hide confidences")
parser.add_argument("--half", action="store_true", help="use FP16 half-precision inference")
parser.add_argument("--dnn", action="store_true", help="use OpenCV DNN for ONNX inference")
parser.add_argument("--vid-stride", type=int, default=1, help="video frame-rate stride")
opt = parser.parse_args()
opt.imgsz *= 2 if len(opt.imgsz) == 1 else 1 # expand
print_args(vars(opt))
return opt

def main(opt):
"""Executes YOLOv5 model inference with given options, checking requirements before running the model."""
check_requirements(ROOT / "requirements.txt", exclude=("tensorboard", "thop"))
run(**vars(opt))

if name == "main":
opt = parse_opt()
main(opt)

I am using my modified detect1.py file from YOLOv5 Pytorch. I already follow the code you show but it still cannot show the video. Can you help me modified the code i share.

@glenn-jocher
Copy link
Member

Hi @Killuagg,

Thank you for sharing your detailed code and setup. Let's address your concerns step by step to ensure we can help you effectively.

Video Playback Issues

The issue with the video not playing could be related to how the video is being saved or displayed. Let's ensure that the video is saved correctly and that the display logic is handled properly.

Ensure Correct Video Saving

First, let's ensure that the video is saved using a widely supported codec like H.264. Here's a snippet to ensure the video is saved correctly:

# Define the codec and create VideoWriter object
fourcc = cv2.VideoWriter_fourcc(*'mp4v')  # Use 'XVID' for .avi files
out = cv2.VideoWriter('output.mp4', fourcc, 20.0, (640, 480))

while cap.isOpened():
    ret, frame = cap.read()
    if ret:
        # Write the frame
        out.write(frame)
    else:
        break

# Release everything if job is finished
cap.release()
out.release()
cv2.destroyAllWindows()

Ensure Correct Video Display

Next, let's ensure that the video display logic is handled correctly. Here’s a simplified version of your detect.py script focusing on video display:

import cv2
import time
import torch
from pathlib import Path
from models.common import DetectMultiBackend
from utils.dataloaders import LoadStreams
from utils.general import check_img_size, non_max_suppression, scale_boxes, xyxy2xywh
from utils.plots import Annotator, colors

# Load model
device = torch.device('cpu')  # Change to 'cuda' if you have a GPU
model = DetectMultiBackend('best.onnx', device=device)
stride, names = model.stride, model.names
imgsz = check_img_size((640, 640), s=stride)  # check image size

# Dataloader
source = '0'  # webcam
dataset = LoadStreams(source, img_size=imgsz, stride=stride, auto=True)

# Run inference
model.warmup(imgsz=(1, 3, *imgsz))  # warmup
for path, im, im0s, vid_cap, s in dataset:
    im = torch.from_numpy(im).to(device)
    im = im.float() / 255.0  # 0 - 255 to 0.0 - 1.0
    if len(im.shape) == 3:
        im = im[None]  # expand for batch dim

    # Inference
    pred = model(im)

    # NMS
    pred = non_max_suppression(pred, 0.25, 0.45, None, False, max_det=1000)

    # Process predictions
    for i, det in enumerate(pred):  # per image
        im0 = im0s[i].copy()
        annotator = Annotator(im0, line_width=3, example=str(names))
        if len(det):
            det[:, :4] = scale_boxes(im.shape[2:], det[:, :4], im0.shape).round()
            for *xyxy, conf, cls in reversed(det):
                label = f'{names[int(cls)]} {conf:.2f}'
                annotator.box_label(xyxy, label, color=colors(int(cls), True))

        # Display results
        cv2.imshow(str(path), im0)
        if cv2.waitKey(1) == ord('q'):  # 1 millisecond
            break

cv2.destroyAllWindows()

Verify Latest Versions

Please ensure you are using the latest versions of torch and the YOLOv5 repository. This can sometimes resolve performance issues due to optimizations and bug fixes in newer releases.

Minimum Reproducible Example

If the issue persists, please provide a minimal reproducible example of your code. This will help us investigate further. You can find more details on creating a minimal reproducible example here. This step is crucial for us to provide accurate and effective support.

Feel free to reach out if you have any more questions or need further assistance. The YOLO community and the Ultralytics team are always here to help! 😊

@Killuagg
Copy link
Author

I am sorry.I am confuse where i need to place the code inside the detect.py

@glenn-jocher
Copy link
Member

Hi @Killuagg,

Thank you for your patience and for providing more details about your setup. Let's clarify where to place the code within your detect.py script to ensure everything runs smoothly.

Integrating the Code into detect.py

  1. Import Necessary Libraries: Ensure you have all the necessary imports at the beginning of your script.
  2. Initialize the Model and Dataloader: This should be done before the main inference loop.
  3. Run Inference and Display Results: This is where the main logic of processing each frame and displaying the results will go.

Here's a structured example to guide you:

import argparse
import os
import sys
from pathlib import Path
import torch
import time
import cv2
from models.common import DetectMultiBackend
from utils.dataloaders import LoadStreams
from utils.general import check_img_size, non_max_suppression, scale_boxes, xyxy2xywh
from utils.plots import Annotator, colors

# Initialize the TTS engine
import pyttsx3
engine = pyttsx3.init()

# Define the main function
def run(weights='best.onnx', source='0', imgsz=(640, 640), conf_thres=0.25, iou_thres=0.45, max_det=1000, device='cpu', view_img=False):
    # Load model
    device = torch.device(device)
    model = DetectMultiBackend(weights, device=device)
    stride, names = model.stride, model.names
    imgsz = check_img_size(imgsz, s=stride)  # check image size

    # Dataloader
    dataset = LoadStreams(source, img_size=imgsz, stride=stride, auto=True)

    # Run inference
    model.warmup(imgsz=(1, 3, *imgsz))  # warmup
    for path, im, im0s, vid_cap, s in dataset:
        im = torch.from_numpy(im).to(device)
        im = im.float() / 255.0  # 0 - 255 to 0.0 - 1.0
        if len(im.shape) == 3:
            im = im[None]  # expand for batch dim

        # Inference
        pred = model(im)

        # NMS
        pred = non_max_suppression(pred, conf_thres, iou_thres, None, False, max_det=max_det)

        # Process predictions
        for i, det in enumerate(pred):  # per image
            im0 = im0s[i].copy()
            annotator = Annotator(im0, line_width=3, example=str(names))
            if len(det):
                det[:, :4] = scale_boxes(im.shape[2:], det[:, :4], im0.shape).round()
                for *xyxy, conf, cls in reversed(det):
                    label = f'{names[int(cls)]} {conf:.2f}'
                    annotator.box_label(xyxy, label, color=colors(int(cls), True))

            # Display results
            if view_img:
                cv2.imshow(str(path), im0)
                if cv2.waitKey(1) == ord('q'):  # 1 millisecond
                    break

            # Generate voice feedback
            detections = [{'label': names[int(cls)]} for *xyxy, conf, cls in reversed(det)]
            for det in detections:
                label = det['label']
                engine.say(f"Detected {label}")
                engine.runAndWait()

    cv2.destroyAllWindows()

# Define the argument parser
def parse_opt():
    parser = argparse.ArgumentParser()
    parser.add_argument('--weights', type=str, default='best.onnx', help='model path')
    parser.add_argument('--source', type=str, default='0', help='source')
    parser.add_argument('--imgsz', type=int, nargs='+', default=[640, 640], help='inference size h,w')
    parser.add_argument('--conf-thres', type=float, default=0.25, help='confidence threshold')
    parser.add_argument('--iou-thres', type=float, default=0.45, help='NMS IoU threshold')
    parser.add_argument('--max-det', type=int, default=1000, help='maximum detections per image')
    parser.add_argument('--device', default='cpu', help='cuda device or cpu')
    parser.add_argument('--view-img', action='store_true', help='show results')
    return parser.parse_args()

# Main entry point
if __name__ == "__main__":
    opt = parse_opt()
    run(**vars(opt))

Explanation:

  1. Imports: Ensure all necessary libraries are imported at the beginning.
  2. Model Initialization: The model is loaded and initialized before the main loop.
  3. Inference Loop: The loop processes each frame, performs inference, and displays the results.
  4. Voice Feedback: The text-to-speech engine provides voice feedback for detected objects.

Next Steps:

  • Verify Latest Versions: Ensure you are using the latest versions of torch and the YOLOv5 repository.
  • Minimum Reproducible Example: If you encounter further issues, please provide a minimal reproducible example. This will help us investigate and resolve the issue more effectively. You can find more details on creating a minimal reproducible example here.

Feel free to reach out if you have any more questions or need further assistance. The YOLO community and the Ultralytics team are always here to help! 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants