Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First frame takes long, others are faster #25

Closed
AnnaFHub opened this issue Jan 8, 2024 · 3 comments
Closed

First frame takes long, others are faster #25

AnnaFHub opened this issue Jan 8, 2024 · 3 comments

Comments

@AnnaFHub
Copy link

AnnaFHub commented Jan 8, 2024

Hello,

I have two questions:

  1. I was playing around with your project and noticed, that when I run the model on a videostream, the first frame takes quite some time to finish, while the next frames are faster. Why is that the case?

I am working in a jupyter notebook, a minimal version of it would look like this:

model_path = '.\\models\\vitpose-l-coco_25.pth'
yolo_path = '.\\models\\yolov8l.pt'
model = VitInference(model_path, yolo_path, model_name='l', yolo_size=544, is_video=True, device=None, det_class="human")
path = "D:\\Documents\\xxx\\"
source = "file_name"
source_type= ".mp4"
cap = cv2.VideoCapture(path+source+source_type)
fourcc = cv2.VideoWriter_fourcc(*'MP4V')
out = cv2.VideoWriter(path+source+'_Test.mp4', fourcc, 25.0, (720,1280))
while cap.isOpened():
    # Read a frame from the video

    success, frame = cap.read()

    if success:
        start = time.time()

        # Run inference on the frame
       keypoints = model.inference(frame)
        
        # draw skeleton on the frame
        img = model.draw(show_yolo=True)
  
        # save frame
        out.write(img)
  
        # display the frame
        cv2.imshow('Output', img)

        end = time.time()
        print('time passed (in sec): ')
        print(end - start)
        
        # Break the loop if 'q' is pressed
        # TODO fenster mit x schießen können
        key = cv2.waitKey(1)
        if key == ord("q"):
            break
    else:
        # Break the loop if the end of the video is reached
        break

# Release the video capture object and close the display window
cap.release()
out.release()
cv2.destroyAllWindows()

when executing the last cell, it takes quite some time (~10-20 sec) before anything happens. Do you know why that is the case?

I sometimes also get the following warning:
WARNING NMS time limit 0.550s exceeded, when it finally starts detecting in the first frame

  1. Is it possible to only detect the pose for people in a specific area of the image defined by a rectangle in pxl coordinates? For example in the following image, I only want the pose for the person in the red rectangle. Would the inference get faster if vit only has to consider one person instead of e.g. 3?
    example
@JunkyByte
Copy link
Owner

JunkyByte commented Jan 10, 2024

Hello! Sorry for but I was overwhelmed the last few days!

Regarding the speed. I also notice larger times for first frame but not as extreme as you are seeing! It seems that the yolo takes some time to setup, I do not have much informations about that. The first time I run the script it took me around 5 seconds to setup the yolo and run first inference. But running the scripts more times even the first frame takes <1 second.
I am running on a mac using MPS as backend so it might be different if you are using CUDA or CPU, what is your setup in more detail so that I can try to reproduce the problem??

I also do climbing so I enjoy your use case, do you obtain decent results for this? I tried in the past and was not obtaining a very good pose..

Regarding question 2: Yes predicting only in the red area is possible and will lead to faster inference! especially if in the outer part of the image there are more poses... You can simply take the rectangle box on the image before passing it to the inference x = img[box coords] (...) model.inference(x, ...) is that enough?
Also be sure to not be using a very high resolution for both the video and the yolo inference, yolo_size=320 is usually more than enough. You use yolo l did you try with yolo s and obtained bad results? I would expect the large model to be quite slower.

Edit: I never got the NMS message, it might have to do with your video. If you are able to share the input I can try on your particular example :)

@AnnaFHub
Copy link
Author

AnnaFHub commented Jan 10, 2024

Hello, no worries!

Yes, I also noticed that it only took that long after I restarted the Jupyter Notebook. If I just rerun the one cell it does not take that long anymore... The fact that it takes that long is probably because of my Hardware. I am working on a Laptop with a GTX 1050 and use CUDA 12.3.

Oh, it's always lovely to meet fellow climbers. :) I have to say, that the higher the climber gets up the wall, the worse the results tend to get. However, VitPose (the large version) gave me the best results so far. I also tried VoloV8 Pose AlphaPose, OpenPose, and Pose model from Mediapipe and they were not as good.

Thanks for the input and tips regarding question 2, I will try them out.
Yes, I tried with Yolo s but that got me nowhere. I have the prediction confidence for Yolo l set to 0.1 as otherwise the climber would not be detected on the wall, especially when higher up...

Regarding the NMS message, I don't get that all the time either... but I have not yet figured out why that is the case... With the same video and settings, I sometimes get it and sometimes I don't... But I think this might be a YOLO issue. At least I got the warning as well when trying out YoloV8 Pose.
I uploaded the video to drive if you want to try it anyways https://drive.google.com/file/d/1ubkf065-YBoPQrVdh_t7JZRORtLRtMgz/view?usp=sharing

@JunkyByte
Copy link
Owner

JunkyByte commented Jan 10, 2024

I tried and I see what you mean with yolo loosing tracking. With such low threshold confidence the NMS error might be caused by the model finding many 'people' boxes (even around the same climber) and the the NMS failing. But yes that is related to yolo.

One approach that might improve your yolo detection: if you know there's a single climber and is going up the wall you could use a box around the person at each frame (and in case of failure revert to full image or whatever). As you know the person proceeds slowly up the wall and will be very close in subsequent frames this might improve your results, making yolo (even with low confidence) search only around previous position.
To calculate the box you can use the bounds of the pose in subsequent frames and enlarge it a bit, or check my code and make it output the yolo as well for you, I do not think it is possible to access that information right now without changing the code.

If I have the time I will conduct further tests using CUDA to check if I have such high slow downs on first frame as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants