First frame takes long, others are faster #25

AnnaFHub · 2024-01-08T14:30:43Z

Hello,

I have two questions:

I was playing around with your project and noticed, that when I run the model on a videostream, the first frame takes quite some time to finish, while the next frames are faster. Why is that the case?

I am working in a jupyter notebook, a minimal version of it would look like this:

model_path = '.\\models\\vitpose-l-coco_25.pth'
yolo_path = '.\\models\\yolov8l.pt'

model = VitInference(model_path, yolo_path, model_name='l', yolo_size=544, is_video=True, device=None, det_class="human")

path = "D:\\Documents\\xxx\\"
source = "file_name"
source_type= ".mp4"

cap = cv2.VideoCapture(path+source+source_type)
fourcc = cv2.VideoWriter_fourcc(*'MP4V')
out = cv2.VideoWriter(path+source+'_Test.mp4', fourcc, 25.0, (720,1280))

while cap.isOpened():
    # Read a frame from the video

    success, frame = cap.read()

    if success:
        start = time.time()

        # Run inference on the frame
       keypoints = model.inference(frame)
        
        # draw skeleton on the frame
        img = model.draw(show_yolo=True)
  
        # save frame
        out.write(img)
  
        # display the frame
        cv2.imshow('Output', img)

        end = time.time()
        print('time passed (in sec): ')
        print(end - start)
        
        # Break the loop if 'q' is pressed
        # TODO fenster mit x schießen können
        key = cv2.waitKey(1)
        if key == ord("q"):
            break
    else:
        # Break the loop if the end of the video is reached
        break

# Release the video capture object and close the display window
cap.release()
out.release()
cv2.destroyAllWindows()

when executing the last cell, it takes quite some time (~10-20 sec) before anything happens. Do you know why that is the case?

I sometimes also get the following warning:
WARNING NMS time limit 0.550s exceeded, when it finally starts detecting in the first frame

Is it possible to only detect the pose for people in a specific area of the image defined by a rectangle in pxl coordinates? For example in the following image, I only want the pose for the person in the red rectangle. Would the inference get faster if vit only has to consider one person instead of e.g. 3?

The text was updated successfully, but these errors were encountered:

JunkyByte · 2024-01-10T10:02:44Z

Hello! Sorry for but I was overwhelmed the last few days!

Regarding the speed. I also notice larger times for first frame but not as extreme as you are seeing! It seems that the yolo takes some time to setup, I do not have much informations about that. The first time I run the script it took me around 5 seconds to setup the yolo and run first inference. But running the scripts more times even the first frame takes <1 second.
I am running on a mac using MPS as backend so it might be different if you are using CUDA or CPU, what is your setup in more detail so that I can try to reproduce the problem??

I also do climbing so I enjoy your use case, do you obtain decent results for this? I tried in the past and was not obtaining a very good pose..

Regarding question 2: Yes predicting only in the red area is possible and will lead to faster inference! especially if in the outer part of the image there are more poses... You can simply take the rectangle box on the image before passing it to the inference x = img[box coords] (...) model.inference(x, ...) is that enough?
Also be sure to not be using a very high resolution for both the video and the yolo inference, yolo_size=320 is usually more than enough. You use yolo l did you try with yolo s and obtained bad results? I would expect the large model to be quite slower.

Edit: I never got the NMS message, it might have to do with your video. If you are able to share the input I can try on your particular example :)

AnnaFHub · 2024-01-10T13:40:28Z

Hello, no worries!

Yes, I also noticed that it only took that long after I restarted the Jupyter Notebook. If I just rerun the one cell it does not take that long anymore... The fact that it takes that long is probably because of my Hardware. I am working on a Laptop with a GTX 1050 and use CUDA 12.3.

Oh, it's always lovely to meet fellow climbers. :) I have to say, that the higher the climber gets up the wall, the worse the results tend to get. However, VitPose (the large version) gave me the best results so far. I also tried VoloV8 Pose AlphaPose, OpenPose, and Pose model from Mediapipe and they were not as good.

Thanks for the input and tips regarding question 2, I will try them out.
Yes, I tried with Yolo s but that got me nowhere. I have the prediction confidence for Yolo l set to 0.1 as otherwise the climber would not be detected on the wall, especially when higher up...

Regarding the NMS message, I don't get that all the time either... but I have not yet figured out why that is the case... With the same video and settings, I sometimes get it and sometimes I don't... But I think this might be a YOLO issue. At least I got the warning as well when trying out YoloV8 Pose.
I uploaded the video to drive if you want to try it anyways https://drive.google.com/file/d/1ubkf065-YBoPQrVdh_t7JZRORtLRtMgz/view?usp=sharing

JunkyByte · 2024-01-10T16:38:14Z

I tried and I see what you mean with yolo loosing tracking. With such low threshold confidence the NMS error might be caused by the model finding many 'people' boxes (even around the same climber) and the the NMS failing. But yes that is related to yolo.

One approach that might improve your yolo detection: if you know there's a single climber and is going up the wall you could use a box around the person at each frame (and in case of failure revert to full image or whatever). As you know the person proceeds slowly up the wall and will be very close in subsequent frames this might improve your results, making yolo (even with low confidence) search only around previous position.
To calculate the box you can use the bounds of the pose in subsequent frames and enlarge it a bit, or check my code and make it output the yolo as well for you, I do not think it is possible to access that information right now without changing the code.

If I have the time I will conduct further tests using CUDA to check if I have such high slow downs on first frame as well.

JunkyByte closed this as completed Jul 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First frame takes long, others are faster #25

First frame takes long, others are faster #25

AnnaFHub commented Jan 8, 2024 •

edited

Loading

JunkyByte commented Jan 10, 2024 •

edited

Loading

AnnaFHub commented Jan 10, 2024 •

edited

Loading

JunkyByte commented Jan 10, 2024 •

edited

Loading

First frame takes long, others are faster #25

First frame takes long, others are faster #25

Comments

AnnaFHub commented Jan 8, 2024 • edited Loading

JunkyByte commented Jan 10, 2024 • edited Loading

AnnaFHub commented Jan 10, 2024 • edited Loading

JunkyByte commented Jan 10, 2024 • edited Loading

AnnaFHub commented Jan 8, 2024 •

edited

Loading

JunkyByte commented Jan 10, 2024 •

edited

Loading

AnnaFHub commented Jan 10, 2024 •

edited

Loading

JunkyByte commented Jan 10, 2024 •

edited

Loading