Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation on coco dataset #33

Closed
omkaar718 opened this issue Jul 2, 2024 · 5 comments
Closed

Evaluation on coco dataset #33

omkaar718 opened this issue Jul 2, 2024 · 5 comments

Comments

@omkaar718
Copy link
Contributor

omkaar718 commented Jul 2, 2024

The results of using this implementation on coco val dataset seem to be quite lower than those reported in the paper.

  • Model: ViT-B
  • YOLOv8 detector model: yolov8x
  • YOLOv8 threshold: 0.35
  • YOLOv8 image size: 640
  • mAP@0.5:0.95 obtained on coco val data: 0.446, mAP@0.5: 0.589.
@JunkyByte
Copy link
Owner

JunkyByte commented Jul 2, 2024

Hello! Thank you for your test, this started as a fork of https://github.com/jaehyunnn/ViTPose_pytorch just to improve the inference pipeline, can you try checking with that implementation if you obtain similar results?

Also if you don't mind to share the code you use for eval, I won't have the time in the next couple weeks but I could do some tests.

Also can you check the map you get with the detector or try to run with groundtruth bbox? They report "Using detection results from a detector that obtains 56 mAP on person"

Thanks

@JunkyByte
Copy link
Owner

Hi I did some checks but I cannot give you an answer. I found that yolov8 had problems on MPS, if by any chance you are running on mac the evaluation. Updating the Ultralytics package solves the problem (I updated the requirements)

@omkaar718
Copy link
Contributor Author

@JunkyByte Thank you for your response!
I have opened a PR (#34) for COCO evaluation code. Readme has been updated with instructions to use the evaluation code.

@omkaar718
Copy link
Contributor Author

omkaar718 commented Jul 3, 2024

@JunkyByte
I found person detection results here provided in the official implementation: https://github.com/ViTAE-Transformer/ViTPose/blob/main/docs/en/tasks/2d_body_keypoint.md#:~:text=Please%20download%20from%20OneDrive%20or%20GoogleDrive from the official implementation's readme.
Not sure if these were the exact ones used by them, but the results have drastically improved and are close to those obtained using the official implementation.

mAP@0.5:0.95, detector threshold = 0.5 to filter out low confidence detection bboxes:

Therefore, the bbox detections resulting from yolov8 could be the main reason behind low scores in this pipeline.

@JunkyByte
Copy link
Owner

@omkaar718 thank you very much for inspecting this. I'm busy these days but I checked your PR and I will eventually merge it in the next few days, so thanks again.

Applying the models to videos I see qualitatively good results, it might be that indeed yolo does not work well for the coco val images.

I will get back to you :) have a nice day!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants