-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ViT Pose Pipeline Examlpe #794
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm through engine forward
- any idea on complexity of postprocessing?
- for stage zero pipelines let's get them added in first into an examples directory potentially
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! clean integration
Perhaps include some of the contents of the PR description as a README in the example folder? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending comment on removing changes to task.py
+1 to a simple README
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥
Stage 0 exploration for the ViT Pose model
Source: https://github.com/ViTAE-Transformer/ViTPose
Installation
Follow the instructions in the readme file. Note:
mmcv
takes a lot of time and may look often like it is stuck. Be patient, it will eventually terminate successfully.1.3
to1.2
to avoid CUDA errors (at least on my server, which does not support Cuda setup for 1.3)Export
Exporting the sample onnx model is quite easy. Before running the onnx install, one needs to manually install
timm
,onnx
andonnxruntime
. Then, launch the export script:The first argument is a config file (for ViTpose B) the second argument is the
.pth
checkpoint (weights). Both can be found on the main site of the repository:The resulting model is about 400mb.
Benchmarking in DeepSparse:
Naive benchmarking shows that for the dense model, the engine is roughly x2 faster then ORT:
Postprocessing
ViT-Pose might be our first candidate for a "composed" deepsparse pipeline. It is a top-down pose estimation approach.
We take an image and run object detection (e.g. look for all the humans in the image)
We pass the cropped bounding boxes to ViT to get an array (batch, no_keypoints, h, w) array. To decode this array, according to the original paper, we need some simple composition of transposed convolutions.
What I do naively in this PR: I "squash" the array to (h,w) and then overlay it on the original image. We can see that the heatmap roughly coincides with the joints of the model.