Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README.md #931

Merged
merged 4 commits into from
Jun 1, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 28 additions & 27 deletions src/deepsparse/open_pif_paf/README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# OpenPifPaf Inference Pipelines

The DeepSparse integration of the OpenPifPaf model is a work in progress. Please check back soon for updates.
The DeepSparse integration of the OpenPifPaf model is a work in progress. Check back soon for updates.
This README serves as a placeholder for internal information that may be useful for further development.

DeepSparse pipeline for OpenPifPaf

## Example use in DeepSparse Python API:
## Example Use in DeepSparse Python API

```python
from deepsparse import Pipeline
Expand All @@ -17,9 +17,9 @@ predictions = pipeline(images=['dancers.jpg'])
predictions[0].scores
>> scores=[0.8542259724243828, 0.7930507659912109]
```
### Output CifCaf fields
Alternatively, instead of returning the detected poses, it is possible to return the intermediate output - the CifCaf fields.
This is the representation returned directly by the neural network, but not yet processed by the matching algorithm
### Output CifCaf Fields
Alternatively, instead of returning the detected poses, it is possible to return the intermediate output—the CifCaf fields.
This is the representation returned directly by the neural network, but not yet processed by the matching algorithm.

```python
...
Expand All @@ -28,12 +28,12 @@ predictions = pipeline(images=['dancers.jpg'])
predictions.fields
```

## Validation script:
This paragraph describes how to run validation of the ONNX model/SparseZoo stub
## Validation Script
This section describes how to run validation of the ONNX model/SparseZoo stub.

### Dataset
For evaluation, you need to download the dataset. The [Open Pif Paf documentation](https://openpifpaf.github.io/) describes
thoroughly how to prepare different datasets for validation. This is the example for `crowdpose` dataset:
For evaluation, you need to download the dataset. The [Open Pif Paf documentation](https://openpifpaf.github.io/)
thoroughly describes how to prepare different datasets for validation. This is the example for the `crowdpose` dataset:

```bash
mkdir data-crowdpose
Expand All @@ -44,13 +44,13 @@ unzip images.zip
# Now you can use the standard openpifpaf.train and openpifpaf.eval
# commands as documented in Training with --dataset=crowdpose.
```
### Create an ONNX model:
### Create an ONNX Model

```bash
python3 -m openpifpaf.export_onnx --input-width 641 --input-height 641
```

### Validation command
### Validation Command
Once the dataset has been downloaded, run the command:
```bash
deepsparse.pose_estimation.eval --model-path openpifpaf-resnet50.onnx --dataset cocokp --image_size 641
Expand All @@ -73,39 +73,40 @@ This should result in the evaluation output similar to this:
````


### Expected output:
### Expected Output

## The necessity of external OpenPifPaf helper function
<img width="678" alt="image" src="https://user-images.githubusercontent.com/97082108/203295520-42fa325f-8a94-4241-af6f-75938ef26b14.png">
## Necessity of the External OpenPifPaf Helper Function

This diagram from the original paper illustrates that once the input image has been encoded into PIF or PAF*
tensors, they need to be decoded later into human-understandable annotations. (* PIF and PAF are equivalent to CIF and CAF with the same meaning but different naming conventions per the original authors.)

As illustrated by the diagram from the original paper: once the input image has been encoded into PIF or PAF (or CIF and CAF, same meaning, just two different naming conventions
per original authors) tensors, they need to be later decoded into human-understandable annotations.
<img width="678" alt="image" src="https://user-images.githubusercontent.com/97082108/203295520-42fa325f-8a94-4241-af6f-75938ef26b14.png">

Once the neural network outputs CIF and CAF tensors, they are then processed by an algorithm described below:
Once the neural network outputs PIF and PAF tensors, they are processed by an algorithm described below:

<img width="337" alt="image" src="https://user-images.githubusercontent.com/97082108/203295686-91305e9c-e455-4ac8-9652-978f9ec8463d.png">

For speed reasons, the decoding in the original `OpenPifPaf` repository is implemented in [C++ and libtorch](https://github.com/openpifpaf/openpifpaf/issues/560): `https://github.com/openpifpaf/openpifpaf/src/openpifpaf/csrc`

Rewriting this functionality would be a significant engineering effort, so I reuse part of the original implementation in the pipeline:
Rewriting this functionality would be a significant engineering effort, so we reuse part of the original implementation in the pipeline, as shown below.

### On the pipeline instantiation
### Pipeline Instantiation

```python
model_cpu, _ = network.Factory().factory(head_metas=None)
self.processor = decoder.factory(model_cpu.head_metas)
```

First, I fetch the default `model` object (also a second argument, which is the last epoch of the pre-trained model) from the factory. Note, this `model` will not be used for inference, only to pull the information
First, we fetch the default `model` object (also a second argument, which is the last epoch of the pre-trained model) from the factory. Note, this `model` will not be used for inference, but rather to pull the information
about the heads of the model: `model_cpu.head_metas: List[Cif, Caf]`. This information will be consumed to create a (set of) decoder(s) (objects that map `fields`, raw network output, to human-understandable annotations).

Note: The `Cif` and `Caf` objects seem to be dataset-dependent. They hold e.g. the information about the expected relationship of the joints of the pose (skeleton).
Note: The `Cif` and `Caf` objects seem to be dataset-dependent. For example, they hold the information about the expected relationship of the joints of the pose (skeleton).

Hint: Instead of returning Annotation objects, the API supports returning annotations as JSON serializable dicts. This is probably what we should aim for.

In the default scenario (I suspect for all the pose estimation tasks), the `self.processor` will be a `Multi` object that holds a single `CifCaf` decoder.
In the default scenario (likely for all the pose estimation tasks), the `self.processor` will be a `Multi` object that holds a single `CifCaf` decoder.

Other available decoders:
Other available decoders are:

```python
{
Expand All @@ -117,7 +118,7 @@ openpifpaf.decoder.tracking_pose.TrackingPose # for tracking task
}
```

## On the engine output preprocessing
## Engine Output Preprocessing

```python
def process_engine_outputs(self, fields):
Expand All @@ -127,10 +128,10 @@ openpifpaf.decoder.tracking_pose.TrackingPose # for tracking task
[torch.tensor(cif),
torch.tensor(caf)], None, None)
```
I am passing the CIF and CAF values directly to the processor (through the private function; `self.processor` itself, by default, does batching and inference).
This is the functionality that we perhaps would like to fold into our computational graph:
We are passing the CIF and CAF values directly to the processor through the private function (`self.processor`, by default, does batching and inference).
Perhaps this is the functionality that we would like to fold into our computational graph:
1. To avoid being dependent on the external library and their torch-dependent implementation
2. Having control (and the possibility to improve upon) over the generic decoder
2. Having control over (and the possibility to improve upon) the generic decoder
Expand Down
Loading