Skip to content

Commit

Permalink
Add custom ONNX guides for use-cases. Add bucketing and scheduler gui…
Browse files Browse the repository at this point in the history
…des (#960)

* added user guide

* Delete qa_server_config.yaml

* removed gatsby headers

* update benchmarking

* Update benchmarking.md

* Update and rename benchmarking.md to deepsparse-benchmarking.md

* Update deepsparse-pipelines.md

* Update deepsparse-server.md

* Update scheduler.md

* Update user-guide/scheduler.md

Co-authored-by: Michael Goin <michael@neuralmagic.com>

* Update user-guide/scheduler.md

Co-authored-by: Michael Goin <michael@neuralmagic.com>

* Update user-guide/scheduler.md

Co-authored-by: Michael Goin <michael@neuralmagic.com>

* Update user-guide/deepsparse-pipelines.md

Co-authored-by: Michael Goin <michael@neuralmagic.com>

* Update user-guide/deepsparse-pipelines.md

Co-authored-by: Michael Goin <michael@neuralmagic.com>

* Update user-guide/deepsparse-pipelines.md

Co-authored-by: Michael Goin <michael@neuralmagic.com>

* added README

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* added sentiment-analysis

* Update sentiment-analysis.md

* added installation

* Update installation.md

* Update installation.md

* Update README.md

* Update deepsparse-pipelines.md

* Update deepsparse-pipelines.md

* add text classification doc

* add text classification doc

* add text classification doc

* Use Engine

* add question answering document

* add token classification document

* update benchmarks

* add transformers extraction embedding doc

* add general embedding doc

* add image classification doc

* add image classification doc

* add yolo document

* add YOLACT doc

* update yolov5 doc

* update yolov5 doc

* Update yolov5-object-detection.md

* Update image-classification.md

* Update image-segmentation-yolact.md

* Apply suggestions from code review

Co-authored-by: Jeannie Finks <74554921+jeanniefinks@users.noreply.github.com>

* RS Edits to CV

* updated embedding extraction example

* updated sentiment analysis and text classification examples

* added zero shot text classification

* RS edited token classification

* updated question answering example

* updated embedding extraction case

* updated directory structure

* updated dir structure

* updated dir structure

* Update image-classification.md

* Update image-classification.md

* Update image-classification.md

* Update object-detection-yolov5.md

* Update object-detection-yolov5.md

* Update object-detection-yolov5.md

* Update image-segmentation-yolact.md

* Update image-segmentation-yolact.md

* Update embedding-extraction.md

* Update sentiment-analysis.md

* Update question-answering.md

* Update text-classification.md

* Update embedding-extraction.md

* Create README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update deepsparse-pipelines.md

* Update deepsparse-pipelines.md

* Update deepsparse-pipelines.md

* Update deepsparse-server.md

* Update deepsparse-pipelines.md

* Update deepsparse-server.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update image-segmentation-yolact.md

* Update image-classification.md

* Update object-detection-yolov5.md

* Update question-answering.md

* Update sentiment-analysis.md

* Update text-classification.md

* Update token-classification.md

* Update transformers-embedding-extraction.md

* Update zero-shot-text-classification.md

* Update question-answering.md

* Update question-answering.md

* Update question-answering.md

* Update sentiment-analysis.md

* Update sentiment-analysis.md

* Update sentiment-analysis.md

* Update text-classification.md

* Update text-classification.md

* Update text-classification.md

* Update text-classification.md

* Update token-classification.md

* Update token-classification.md

* Update zero-shot-text-classification.md

* Update zero-shot-text-classification.md

* Update zero-shot-text-classification.md

* Update transformers-embedding-extraction.md

* Update embedding-extraction.md

* Update image-classification.md

* Update image-segmentation-yolact.md

* Update object-detection-yolov5.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Add files via upload

* Update README.md

* Update README.md

* Update README.md

* added copyrights

* Update README.md

Co-authored-by: Michael Goin <michael@neuralmagic.com>

* Update README.md

Co-authored-by: Michael Goin <michael@neuralmagic.com>

* Update README.md

Co-authored-by: Michael Goin <michael@neuralmagic.com>

* Update README.md

Co-authored-by: Michael Goin <michael@neuralmagic.com>

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* How to use the scheduler across engine, pipeline, server

* How to use the scheduler across engine, pipeline, server

* Using custom ONNX file with YOLOv5

* YOLACT ONNX docs

* RestNet ONNX docs

* ONNX embedding extraction

* custom ONNX question answering

* custom ONNX sentiment analysis

* update sentiment and QA docs

* update sentiment and QA docs

* text classification ONNX

* token classification ONNX

* transformer embedding extraction ONNX

* zero shot text classification ONNX

* Add copy right

* bucketing docs

* update bucketing

* Download models

* move ONNX docs

* update model download section

* Update qa docs

* scheduler update

* Fix merge with main

---------

Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: Derrick Mwiti <mwitiderrick@gmail.com>
Co-authored-by: Jeannie Finks <74554921+jeanniefinks@users.noreply.github.com>
  • Loading branch information
4 people committed Sep 1, 2023
1 parent 136b9f8 commit fa73631
Show file tree
Hide file tree
Showing 13 changed files with 620 additions and 1 deletion.
28 changes: 28 additions & 0 deletions docs/use-cases/cv/embedding-extraction.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,3 +106,31 @@ print(len(result["embeddings"][0][0]))

### Cross Use Case Functionality
Check out the [Server User Guide](../../user-guide/deepsparse-server.md) for more details on configuring the Server.

## Using a Custom ONNX File
Apart from using models from the SparseZoo, DeepSparse allows you to define custom ONNX files for embedding extraction.

The first step is to obtain the ONNX model. You can obtain the file by converting your model to ONNX after training.

Download the [ResNet-50 - ImageNet](https://sparsezoo.neuralmagic.com/models/cv%2Fclassification%2Fresnet_v1-50%2Fpytorch%2Fsparseml%2Fimagenet%2Fpruned95_uniform_quant-none) ONNX model for demonstration:

```bash
sparsezoo.download zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95_uniform_quant-none --save-dir ./embedding-extraction
```
Use the ResNet-50 ONNX model for embedding extraction:
```python
from deepsparse import Pipeline

# this step removes the projection head before compiling the model
rn50_embedding_pipeline = Pipeline.create(
task="embedding-extraction",
base_task="image-classification", # tells the pipeline to expect images and normalize input with ImageNet means/stds
model_path="embedding-extraction/model.onnx",
emb_extraction_layer=-3, # extracts last layer before projection head and softmax
)

# this step runs pre-processing, inference and returns an embedding
embedding = rn50_embedding_pipeline(images="lion.jpeg")
print(len(embedding.embeddings[0][0]))
# 2048
```
25 changes: 25 additions & 0 deletions docs/use-cases/cv/image-classification.md
Original file line number Diff line number Diff line change
Expand Up @@ -259,6 +259,31 @@ resp = requests.post(url=url, files=files)
print(resp.text)
# {"labels":[291,260,244],"scores":[24.185693740844727,18.982254028320312,16.390701293945312]}
```

### Cross Use Case Functionality

Check out the [Server User Guide](../../user-guide/deepsparse-server.md) for more details on configuring the Server.
## Using a Custom ONNX File
Apart from using models from the SparseZoo, DeepSparse allows you to define custom ONNX files when deploying a model.

The first step is to obtain the ONNX model. You can obtain the file by converting your model to ONNX after training.

Download the [ResNet-50 - ImageNet](https://sparsezoo.neuralmagic.com/models/cv%2Fclassification%2Fresnet_v1-50%2Fpytorch%2Fsparseml%2Fimagenet%2Fpruned95_uniform_quant-none) ONNX model for demonstration:
```bash
sparsezoo.download zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95_uniform_quant-none --save-dir ./image_classification
```
Use the ResNet-50 ONNX model for inference:
```python
from deepsparse import Pipeline

# download onnx from sparsezoo and compile with batch size 1
pipeline = Pipeline.create(
task="image_classification",
model_path="image_classification/model.onnx", # sparsezoo stub or path to local ONNX
)

# run inference on image file
prediction = pipeline(images=["lion.jpeg"])
print(prediction.labels)
# [291]
```
26 changes: 26 additions & 0 deletions docs/use-cases/cv/image-segmentation-yolact.md
Original file line number Diff line number Diff line change
Expand Up @@ -224,6 +224,32 @@ resp = requests.post(url=url, files=files)
annotations = json.loads(resp.text) # dictionary of annotation results
boxes, classes, masks, scores = annotations["boxes"], annotations["classes"], annotations["masks"], annotations["scores"]
```

### Cross Use Case Functionality

Check out the [Server User Guide](../../user-guide/deepsparse-server.md) for more details on configuring the Server.

## Using a Custom ONNX File
Apart from using models from the SparseZoo, DeepSparse allows you to define custom ONNX files when deploying a model.

The first step is to obtain the ONNX model. You can obtain the file by converting your model to ONNX after training.

Download on the [YOLCAT](https://sparsezoo.neuralmagic.com/models/cv%2Fsegmentation%2Fyolact-darknet53%2Fpytorch%2Fdbolya%2Fcoco%2Fpruned82_quant-none) ONNX model for demonstration:
```bash
sparsezoo.download zoo:cv/segmentation/yolact-darknet53/pytorch/dbolya/coco/pruned82_quant-none --save-dir ./yolact
```
Use the YOLACT ONNX model for inference:
```python
from deepsparse.pipeline import Pipeline

yolact_pipeline = Pipeline.create(
task="yolact",
model_path="yolact/model.onnx",
)

images = ["thailand.jpeg"]
predictions = yolact_pipeline(images=images)
# predictions has attributes `boxes`, `classes`, `masks` and `scores`
predictions.classes[0]
# [20,20, .......0, 0,24]
```
36 changes: 36 additions & 0 deletions docs/use-cases/cv/object-detection-yolov5.md
Original file line number Diff line number Diff line change
Expand Up @@ -285,3 +285,39 @@ print(labels)
### Cross Use Case Functionality

Check out the [Server User Guide](../../user-guide/deepsparse-server.md) for more details on configuring a Server.
## Using a Custom ONNX File
Apart from using models from the SparseZoo, DeepSparse allows you to define custom ONNX files when deploying a model.

The first step is to obtain the YOLOv5 ONNX model. This could be a YOLOv5 model you have trained and converted to ONNX.
In this case, let's demonstrate by converting a YOLOv5 model to ONNX using the `ultralytics` package:
```python
from ultralytics import YOLO

# Load a model
model = YOLO("yolov5nu.pt") # load a pretrained model
success = model.export(format="onnx") # export the model to ONNX format
```
Download a sample image for detection:
```bash
wget -O basilica.jpg https://github.com/raw/neuralmagic/deepsparse/main/src/deepsparse/yolo/sample_images/basilica.jpg

```
Next, run the DeepSparse object detection pipeline with the custom ONNX file:

```python
from deepsparse import Pipeline

# download onnx from sparsezoo and compile with batch size 1
yolo_pipeline = Pipeline.create(
task="yolo",
model_path="yolov5nu.onnx", # sparsezoo stub or path to local ONNX
)
images = ["basilica.jpg"]

# run inference on image file
pipeline_outputs = yolo_pipeline(images=images)
print(pipeline_outputs.boxes)
print(pipeline_outputs.labels)
# [[[-0.8809833526611328, 5.1244752407073975, 27.885415077209473, 57.20366072654724], [-9.014896631240845, -2.4366320967674255, 21.488688468933105, 37.2245477437973], [14.241515636444092, 11.096746131777763, 30.164274215698242, 22.02291651070118], [7.107024908065796, 5.017698150128126, 15.09239387512207, 10.45704211294651]]]
# [['8367.0', '1274.0', '8192.0', '6344.0']]
```
135 changes: 135 additions & 0 deletions docs/use-cases/general/bucketing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
<!--
Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# How to Use Bucketing With DeepSparse
DeepSparse supports bucketing to lower latency and increase the throughput of deep learning pipelines. Bucketing sequences of different sizes increases inference speed.

Input lengths in NLP problems can vary. We usually select a maximum length where sentences longer that the maximum length are truncated and shorter ones are padded to reach the maximum length. This solution can be inefficient for real-world applications leading to more memory utilization.

Bucketing is a solution that places sequences of varying lengths in different buckets. It is more efficient because it reduces the amount of padding required.

In this document, we will explore how to use bucketing with DeepSparse.

## How Bucketing Works in DeepSparse
DeepSparse handles bucketing natively to reduce the time you would otherwise spend building this preprocessing pipeline. Bucketing with DeepSparse leads to a performance boost compared to a pipeline without bucketing. When buckets are provided, DeepSparse will create different models for the provided input sizes.

For example, if your input data length ranges from 157 to 4063, with 700 being the median and you are using a model like BERT, whose maximum token length is 512, you can use these input shapes [256,320,384,448, 512]. This means that all tokens shorter than 256 will be padded to 256, while any tokens longer than 512 will be truncated to 512. Tokens longer than 256 will be padded to 320, and so on.

At inference, each input is sent to the corresponding bucketed model. In this case, you’d have 5 models because you have defined 5 buckets. Bucketing reduces the amount of compute because you are no longer padding all the sequences to the maximum length in the dataset. You can decide on the bucket sizes by examining the distribution of the dataset and experimenting with different sizes. The best choice is the one that covers all the inputs in the range of the dataset.

## Bucketing NLP Models with DeepSparse
DeepSparse makes it easy to set up bucketing. You pass the desired bucket sizes, and DeepSparse will automatically set up the buckets. You can determine the optimal size of the buckets by analyzing the lengths of the input data and selecting buckets where most of the data lies.

For example, here's the distribution of the [wnut_17](https://huggingface.co/datasets/wnut_17) dataset:
![image](images/wnut.png)
Visualizing the data distribution enables you to choose the best bucket sizes to use.

Define a token classification pipeline that uses no buckets, later you will compare it performance with one that uses buckets. The `deployment` folder contains the model configuration files for a token classification model obtained by:
```bash
sparsezoo.download zoo:nlp/token_classification/bert-large/pytorch/huggingface/conll2003/base-none --save-dir ./dense-model
```
The folder contains:
- `config.json`
- `model.onnx`
- `tokenizer.json`

```python
from deepsparse import Pipeline
import deepsparse.transformers
from datasets import load_dataset
from transformers import AutoTokenizer
from tqdm import tqdm
import time

def run(model_path, batch_size, buckets):
### SETUP DATASETS - in this case, we download WNUT_17
print("Setting up the dataset:")

INPUT_COL = "sentences"
dataset = load_dataset("wnut_17", split="train")
sentences = []
for sentence in dataset["tokens"]:
string = ""
for elt in sentence:
string += elt
string += " "
sentences.append(string)
dataset = dataset.add_column(INPUT_COL, sentences)

### TOKENIZE DATASET - (used to comptue buckets)
tokenizer = AutoTokenizer.from_pretrained(model_path)
def pre_process_fn(examples):
return tokenizer(examples[INPUT_COL], add_special_tokens=True, return_tensors="np",padding=False,truncation=False)

dataset = dataset.map(pre_process_fn, batched=True)
dataset = dataset.add_column("num_tokens", list(map(len, dataset["input_ids"])))
dataset = dataset.sort("num_tokens")
max_token_len = dataset[-1]["num_tokens"]

### SPLIT DATA INTO BATCHES
num_pad_items = batch_size - (dataset.num_rows % batch_size)
inputs = ([""] * num_pad_items) + dataset[INPUT_COL]
batches = []
for b_index_start in range(0, len(inputs), batch_size):
batches.append(inputs[b_index_start:b_index_start+batch_size])

### RUN THROUPUT TESTING
print("\nCompiling models:")

# compile model with buckets
buckets.append(max_token_len)
ds_pipeline = Pipeline.create(
"token_classification",
model_path=model_path,
batch_size=batch_size,
sequence_length=buckets,
)

print("\nRunning test:")

# run inferences on the dataset
start = time.perf_counter()

predictions = []
for batch in tqdm(batches):
predictions.append(ds_pipeline(batch))

# flatten and remove padded predictions
predictions = [pred for sublist in predictions for pred in sublist.predictions]
predictions = predictions[num_pad_items:]
end = time.perf_counter()

# compute throughput
total_time_executing = (end - start) * 1000.0
items_per_sec = len(predictions) / total_time_executing

print(f"Items Per Second: {items_per_sec}")
print(f"Program took: {total_time_executing} ms")
return predictions

predictions = run("token_classification", 64, [])
# Items Per Second: 0.0060998544593741395
# Program took: 556406.7179970443 ms
```

Run the same script with varying input lengths:
```python
batch_size = 64
buckets = [15,35,55,75]
predictions = run("token_classification", batch_size, buckets)
# Items Per Second: 0.01046572543802951
# Program took: 324296.67872493155 ms
```
The pipeline using buckets achieves 1.7 more items per second compared to the one without.
Binary file added docs/use-cases/general/images/wnut.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit fa73631

Please sign in to comment.