Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CLIP] Captioning Pipeline #1145

Merged
merged 47 commits into from
Aug 7, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
9fa4197
initial refactor
dsikka Jun 19, 2023
073cf38
move BasePipeline to a new file
dsikka Jun 27, 2023
128f7eb
test fix
dsikka Jun 27, 2023
f5826a4
anothe test fix
dsikka Jun 27, 2023
da81c7d
fix import
dsikka Jun 27, 2023
a370e02
revert
dsikka Jun 27, 2023
04e01f5
initial refactor
dsikka Jun 19, 2023
1c0a086
add tests for BasePipeline
dsikka Jun 21, 2023
f66c83b
move BasePipeline to a new file
dsikka Jun 27, 2023
9f12482
initial refactor
dsikka Jun 19, 2023
2da0d75
update test; finish off initial refactoring changes post local testing
dsikka Jun 20, 2023
2951495
initial commit for clip zero-shot
dsikka Jun 23, 2023
e925ec8
add basic structure for text branch and zeroshot
dsikka Jun 25, 2023
3e02cda
add schema details
dsikka Jun 25, 2023
6f1491b
update pipelines after running mock engine tests
dsikka Jun 26, 2023
5bd3dc6
add zeroshot tests
dsikka Jun 27, 2023
a31fb4e
rebase fix
dsikka Jun 30, 2023
24ed6d1
clean-up comments; add note about onnx export issue
dsikka Jun 30, 2023
b144fa7
move paths to fixtures
dsikka Jun 30, 2023
775cdf5
rebase fix
dsikka Jul 13, 2023
a93986a
rebase fix
dsikka Jul 18, 2023
3d4f3c6
refactor pipelines to separate visual, text, and zeroshot. also add p…
dsikka Jul 21, 2023
3e2ee70
fix rebase
dsikka Aug 1, 2023
7c14b7a
initial refactor
dsikka Jun 19, 2023
d7595e8
move BasePipeline to a new file
dsikka Jun 27, 2023
cd35c2b
initial refactor
dsikka Jun 19, 2023
2624c41
move BasePipeline to a new file
dsikka Jun 27, 2023
bebe206
initial refactor
dsikka Jun 19, 2023
9a81e32
rebase fix
dsikka Jun 30, 2023
921818e
move paths to fixtures
dsikka Jun 30, 2023
836b157
initial refactor
dsikka Jun 19, 2023
a7f1e30
initial caption functionality
dsikka Jul 25, 2023
11e4c0e
debugging
dsikka Jul 26, 2023
10c7835
more debugging
dsikka Jul 26, 2023
7d1c5ca
post debugging code
dsikka Jul 27, 2023
0f9ebcc
fix imports
dsikka Jul 27, 2023
9b147fc
cleanup post model fix
dsikka Jul 29, 2023
699dafc
fix variable names, some clean-up
dsikka Jul 30, 2023
55cf8c6
remove image embs loading
dsikka Jul 31, 2023
83c9570
update dimensions
dsikka Jul 31, 2023
ce670a9
rebase
dsikka Jul 31, 2023
6c8cd4d
remove extra param
dsikka Jul 31, 2023
dd1d6b2
remove typo
dsikka Jul 31, 2023
04be990
update README instructions; fix linalg import
dsikka Jul 31, 2023
74a6e5b
clean-up pipelines, updatetyping and descriptions
dsikka Aug 1, 2023
d583fde
rebase fix
dsikka Aug 1, 2023
8fb97ad
expose pipeline engine args
dsikka Aug 3, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,11 @@ def _parse_requirements_file(file_path):
"haystack_reqs.txt",
)
_haystack_integration_deps = _parse_requirements_file(_haystack_requirements_file_path)
_clip_deps = ["open_clip_torch==2.20.0", "scipy==1.10.1"]
_clip_deps = [
"open_clip_torch==2.20.0",
"scipy==1.10.1",
f"{'nm-transformers' if is_release else 'nm-transformers-nightly'}",
]

_torch_deps = ["torch>=1.7.0,<=2.0"]

Expand Down
59 changes: 54 additions & 5 deletions src/deepsparse/clip/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ DeepSparse allows inference on [CLIP](https://github.com/mlfoundations/open_clip

The CLIP integration currently supports the following task:
- **Zero-shot Image Classification** - Classifying images given possible classes
- **Caption Generation** - Generate a caption given an image

## Getting Started

Expand All @@ -13,24 +14,38 @@ Before you start your adventure with the DeepSparse Engine, make sure that your
```pip install deepsparse[clip]```

### Model Format
By default, to deploy CLIP models using the DeepSparse Engine, it is required to supply the model in the ONNX format. This grants the engine the flexibility to serve any model in a framework-agnostic environment. To see examples of pulling CLIP models and exporting them to ONNX, please see the [sparseml documentation](https://github.com/neuralmagic/sparseml/tree/main/integrations/clip). For the Zero-shot image classification workflow, two ONNX models are required, a visual model for CLIP's visual branch, and a text model for CLIP's text branch. Both of these model should be produced through the sparseml integration linked above.
By default, to deploy CLIP models using the DeepSparse Engine, it is required to supply the model in the ONNX format. This grants the engine the flexibility to serve any model in a framework-agnostic environment. To see examples of pulling CLIP models and exporting them to ONNX, please see the [sparseml documentation](https://github.com/neuralmagic/sparseml/tree/main/integrations/clip).

For the Zero-shot image classification workflow, two ONNX models are required, a visual model for CLIP's visual branch, and a text model for CLIP's text branch. Both of these models can be produced through the sparseml integration linked above. For caption generation, specific models called CoCa models are required and instructions on how to export CoCa models are also provided in the sparseml documentation above. The CoCa exporting pathway will generate one additional decoder model, along with the text and visual models.

### Deployment examples:
The following example uses pipelines to run the CLIP models for inference. As input, the pipeline ingests a list of images and a list of possible classes. A class is returned for each of the provided images.
The following example uses pipelines to run the CLIP models for inference. For Zero-shot prediction, the pipeline ingests a list of images and a list of possible classes. A class is returned for each of the provided images. For caption generation, only an image file is required.

If you don't have images ready, pull down the sample images using the following commands:

```bash
wget -O basilica.jpg https://github.com/raw/neuralmagic/deepsparse/main/src/deepsparse/yolo/sample_images/basilica.jpg
```

```bash
wget -O buddy.jpeg https://github.com/raw/neuralmagic/deepsparse/main/tests/deepsparse/pipelines/sample_images/buddy.jpeg
```

This will pull down two images, one with a happy dog and one with St.Peter's basilica.
```bash
wget -O thailand.jpg https://github.com/raw/neuralmagic/deepsparse/main/src/deepsparse/yolact/sample_images/thailand.jpg
```

<p float="left">
<img src="https://github.com/raw/neuralmagic/deepsparse/main/src/deepsparse/yolo/sample_images/basilica.jpg" width="300" />
<img src="https://github.com/raw/neuralmagic/deepsparse/main/tests/deepsparse/pipelines/sample_images/buddy.jpeg" width="300" />
<img src="https://github.com/raw/neuralmagic/deepsparse/main/src/deepsparse/yolact/sample_images/thailand.jpg" width="300" />
</p>

This will pull down 3 images, a happy dog, St.Peter's basilica, and two elephants.

#### Zero-shot Prediction

Let's run an example to clasify the images. We'll provide the images in a list with their file names as well as a list of possible classes. We'll also provide paths to the exported ONNX models.
Let's run an example to clasify the images. We'll provide the images in a list with their file names as well as a list of possible classes. We'll also provide paths to the exported ONNX models under the `zeroshot_research` root folder.

```python
import numpy as np
Expand All @@ -43,7 +58,7 @@ from deepsparse.clip import (
)

possible_classes = ["ice cream", "an elephant", "a dog", "a building", "a church"]
images = ["basilica.jpg", "buddy.jpeg"]
images = ["basilica.jpg", "buddy.jpeg", "thailand.jpg"]

model_path_text = "zeroshot_research/text/model.onnx"
model_path_visual = "zeroshot_research/visual/model.onnx"
Expand Down Expand Up @@ -72,4 +87,38 @@ DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.6.0.20230727 C

Image basilica.jpg is a picture of a church
Image buddy.jpeg is a picture of a dog
Image thailand.jpg is a picture of an elephant
```

#### Caption Generation
Let's try a caption generation example. We'll leverage the `thailand.jpg` file that was pulled down earlier. We'll also provide the 3 exported CoCa ONNX models under the `caption_models` folder.

```python
from deepsparse import BasePipeline
from deepsparse.clip import CLIPCaptionInput, CLIPVisualInput

root = "caption_models"
model_path_visual = f"{root}/clip_visual.onnx"
model_path_text = f"{root}/clip_text.onnx"
model_path_decoder = f"{root}/clip_text_decoder.onnx"
engine_args = {"num_cores": 8}

kwargs = {
"visual_model_path": model_path_visual,
"text_model_path": model_path_text,
"decoder_model_path": model_path_decoder,
"pipeline_engine_args": engine_args
}
pipeline = BasePipeline.create(task="clip_caption", **kwargs)

pipeline_input = CLIPCaptionInput(image=CLIPVisualInput(images="thailand.jpg"))
output = pipeline(pipeline_input).caption
print(output[0])
```
Running the code above, we get the following caption:

```
DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.6.0.20230727 COMMUNITY | (3cb4a3e5) (optimized) (system=avx2, binary=avx2)

an adult elephant and a baby elephant .
```
22 changes: 6 additions & 16 deletions src/deepsparse/clip/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,21 +11,11 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# flake8: noqa
from deepsparse.clip.decoder_pipeline import *
from deepsparse.clip.text_pipeline import *
from deepsparse.clip.visual_pipeline import *


from deepsparse.clip.text_pipeline import (
CLIPTextInput,
CLIPTextOutput,
CLIPTextPipeline,
)
from deepsparse.clip.visual_pipeline import (
CLIPVisualInput,
CLIPVisualOutput,
CLIPVisualPipeline,
)
from deepsparse.clip.zeroshot_pipeline import (
CLIPZeroShotInput,
CLIPZeroShotOutput,
CLIPZeroShotPipeline,
)
from deepsparse.clip.zeroshot_pipeline import * # isort:skip
from deepsparse.clip.captioning_pipeline import * # isort:skip
Loading
Loading