Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add custom ONNX guides for use-cases. Add bucketing and scheduler guides #960

Merged
merged 177 commits into from
Sep 1, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
177 commits
Select commit Hold shift + click to select a range
7cd8dbf
added user guide
robertgshaw2-neuralmagic Mar 16, 2023
a818bce
Delete qa_server_config.yaml
robertgshaw2-neuralmagic Mar 16, 2023
0dfb6e1
removed gatsby headers
robertgshaw2-neuralmagic Mar 16, 2023
b0d3454
update benchmarking
robertgshaw2-neuralmagic Mar 16, 2023
6124994
Update benchmarking.md
robertgshaw2-neuralmagic Mar 16, 2023
a18a79b
Update and rename benchmarking.md to deepsparse-benchmarking.md
robertgshaw2-neuralmagic Mar 16, 2023
639b8c1
Update deepsparse-pipelines.md
robertgshaw2-neuralmagic Mar 16, 2023
5b5c23a
Update deepsparse-server.md
robertgshaw2-neuralmagic Mar 16, 2023
15624f6
Update scheduler.md
robertgshaw2-neuralmagic Mar 16, 2023
17a2e68
Update user-guide/scheduler.md
robertgshaw2-neuralmagic Mar 16, 2023
8053834
Update user-guide/scheduler.md
robertgshaw2-neuralmagic Mar 16, 2023
eb57109
Update user-guide/scheduler.md
robertgshaw2-neuralmagic Mar 16, 2023
3946aca
Update user-guide/deepsparse-pipelines.md
robertgshaw2-neuralmagic Mar 16, 2023
d9803bb
Update user-guide/deepsparse-pipelines.md
robertgshaw2-neuralmagic Mar 16, 2023
7e0a97b
Update user-guide/deepsparse-pipelines.md
robertgshaw2-neuralmagic Mar 16, 2023
ca0f27d
added README
robertgshaw2-neuralmagic Mar 16, 2023
61d1126
Merge branch 'rs/docs-update-user-guide' of github.com:neuralmagic/de…
robertgshaw2-neuralmagic Mar 16, 2023
7a55eec
Update README.md
robertgshaw2-neuralmagic Mar 16, 2023
f18acb3
Update README.md
robertgshaw2-neuralmagic Mar 16, 2023
9e5f05f
Update README.md
robertgshaw2-neuralmagic Mar 16, 2023
8a2c666
Update README.md
robertgshaw2-neuralmagic Mar 16, 2023
8ff12f5
Update README.md
robertgshaw2-neuralmagic Mar 16, 2023
19804f2
Update README.md
robertgshaw2-neuralmagic Mar 16, 2023
53e9a93
Update README.md
robertgshaw2-neuralmagic Mar 16, 2023
d8972f2
added sentiment-analysis
robertgshaw2-neuralmagic Mar 16, 2023
e9e2685
Update sentiment-analysis.md
robertgshaw2-neuralmagic Mar 16, 2023
dd5bdfd
added installation
robertgshaw2-neuralmagic Mar 16, 2023
b84e01e
Update installation.md
robertgshaw2-neuralmagic Mar 16, 2023
c145ac4
Update installation.md
robertgshaw2-neuralmagic Mar 16, 2023
f978ba4
Update README.md
robertgshaw2-neuralmagic Mar 16, 2023
380cf6e
Update deepsparse-pipelines.md
robertgshaw2-neuralmagic Mar 16, 2023
643af49
Update deepsparse-pipelines.md
robertgshaw2-neuralmagic Mar 16, 2023
12092ee
add text classification doc
mwitiderrick Mar 20, 2023
13ac283
add text classification doc
mwitiderrick Mar 21, 2023
3c256d2
add text classification doc
mwitiderrick Mar 21, 2023
eaaca7a
Use Engine
mwitiderrick Mar 21, 2023
1b5cf02
add question answering document
mwitiderrick Mar 21, 2023
dd2bed8
add token classification document
mwitiderrick Mar 22, 2023
2108ed6
update benchmarks
mwitiderrick Mar 22, 2023
d057130
add transformers extraction embedding doc
mwitiderrick Mar 22, 2023
ddb3091
add general embedding doc
mwitiderrick Mar 22, 2023
e99a196
add image classification doc
mwitiderrick Mar 23, 2023
a7acf3c
add image classification doc
mwitiderrick Mar 23, 2023
2b74cf1
add yolo document
mwitiderrick Mar 23, 2023
9fed9c8
add YOLACT doc
mwitiderrick Mar 24, 2023
bd227f0
update yolov5 doc
mwitiderrick Mar 24, 2023
1a7334d
update yolov5 doc
mwitiderrick Mar 24, 2023
46e78c2
Update yolov5-object-detection.md
mwitiderrick Apr 3, 2023
92e4dc8
Update image-classification.md
mwitiderrick Apr 3, 2023
d259fbc
Update image-segmentation-yolact.md
mwitiderrick Apr 3, 2023
59c3efe
Apply suggestions from code review
mgoin Apr 12, 2023
0412a98
Merge branch 'main' into rs/docs-update-user-guide
mgoin Apr 12, 2023
5f77059
Merge branch 'main' into rs/docs-update-use-cases
mgoin Apr 13, 2023
6a1a7d9
RS Edits to CV
robertgshaw2-neuralmagic Apr 17, 2023
ae25136
updated embedding extraction example
robertgshaw2-neuralmagic Apr 17, 2023
94b2f04
updated sentiment analysis and text classification examples
robertgshaw2-neuralmagic Apr 17, 2023
4ebdc59
added zero shot text classification
robertgshaw2-neuralmagic Apr 17, 2023
0a56876
RS edited token classification
robertgshaw2-neuralmagic Apr 18, 2023
e31c22e
updated question answering example
robertgshaw2-neuralmagic Apr 18, 2023
f260be7
updated embedding extraction case
robertgshaw2-neuralmagic Apr 18, 2023
558773c
Merge pull request #1003 from neuralmagic/rs/docs-update-user-guide
robertgshaw2-neuralmagic Apr 18, 2023
05a7f07
updated directory structure
robertgshaw2-neuralmagic Apr 18, 2023
588b658
updated dir structure
robertgshaw2-neuralmagic Apr 18, 2023
7cb5671
updated dir structure
robertgshaw2-neuralmagic Apr 18, 2023
b483960
Update image-classification.md
robertgshaw2-neuralmagic Apr 18, 2023
0fd2eb1
Update image-classification.md
robertgshaw2-neuralmagic Apr 18, 2023
d89739b
Update image-classification.md
robertgshaw2-neuralmagic Apr 18, 2023
6942c5c
Update object-detection-yolov5.md
robertgshaw2-neuralmagic Apr 18, 2023
bc28cf1
Update object-detection-yolov5.md
robertgshaw2-neuralmagic Apr 18, 2023
2297bec
Update object-detection-yolov5.md
robertgshaw2-neuralmagic Apr 18, 2023
ef32b3a
Update image-segmentation-yolact.md
robertgshaw2-neuralmagic Apr 18, 2023
5390079
Update image-segmentation-yolact.md
robertgshaw2-neuralmagic Apr 18, 2023
0755b3a
Update embedding-extraction.md
robertgshaw2-neuralmagic Apr 18, 2023
739ba15
Update sentiment-analysis.md
mwitiderrick Apr 19, 2023
dc724e1
Update question-answering.md
mwitiderrick Apr 19, 2023
59790f9
Update text-classification.md
mwitiderrick Apr 19, 2023
8fbee7a
Update embedding-extraction.md
robertgshaw2-neuralmagic Apr 19, 2023
c7287c5
Create README.md
robertgshaw2-neuralmagic Apr 19, 2023
b00ea3c
Update README.md
robertgshaw2-neuralmagic Apr 19, 2023
cab0f2c
Update README.md
robertgshaw2-neuralmagic Apr 19, 2023
c02a9a4
Update README.md
robertgshaw2-neuralmagic Apr 19, 2023
4e9a61c
Update README.md
robertgshaw2-neuralmagic Apr 19, 2023
e0b3fc6
Update README.md
robertgshaw2-neuralmagic Apr 19, 2023
ea2915c
Update README.md
robertgshaw2-neuralmagic Apr 19, 2023
f05ba01
Update README.md
robertgshaw2-neuralmagic Apr 19, 2023
847621b
Update deepsparse-pipelines.md
robertgshaw2-neuralmagic Apr 19, 2023
1b672c9
Update deepsparse-pipelines.md
robertgshaw2-neuralmagic Apr 19, 2023
0482754
Update deepsparse-pipelines.md
robertgshaw2-neuralmagic Apr 19, 2023
a78f494
Update deepsparse-server.md
robertgshaw2-neuralmagic Apr 19, 2023
f9914bb
Update deepsparse-pipelines.md
robertgshaw2-neuralmagic Apr 19, 2023
e6420d3
Update deepsparse-server.md
robertgshaw2-neuralmagic Apr 19, 2023
66e264b
Update README.md
robertgshaw2-neuralmagic Apr 19, 2023
3633383
Update README.md
robertgshaw2-neuralmagic Apr 19, 2023
981774a
Update README.md
robertgshaw2-neuralmagic Apr 19, 2023
a2ecfa7
Update README.md
robertgshaw2-neuralmagic Apr 19, 2023
b8af6ba
Update README.md
robertgshaw2-neuralmagic Apr 19, 2023
0e123ef
Update image-segmentation-yolact.md
robertgshaw2-neuralmagic Apr 19, 2023
c2aa202
Update image-classification.md
robertgshaw2-neuralmagic Apr 19, 2023
2e267cd
Update object-detection-yolov5.md
robertgshaw2-neuralmagic Apr 19, 2023
bc12263
Update question-answering.md
robertgshaw2-neuralmagic Apr 19, 2023
d18f1eb
Update sentiment-analysis.md
robertgshaw2-neuralmagic Apr 19, 2023
c55572c
Update text-classification.md
robertgshaw2-neuralmagic Apr 19, 2023
a444ee3
Update token-classification.md
robertgshaw2-neuralmagic Apr 19, 2023
94f2644
Update transformers-embedding-extraction.md
robertgshaw2-neuralmagic Apr 19, 2023
cb8b5dd
Update zero-shot-text-classification.md
robertgshaw2-neuralmagic Apr 19, 2023
8bca17d
Update question-answering.md
robertgshaw2-neuralmagic Apr 19, 2023
248ea13
Update question-answering.md
robertgshaw2-neuralmagic Apr 19, 2023
c6dda09
Update question-answering.md
robertgshaw2-neuralmagic Apr 19, 2023
3f1a30d
Update sentiment-analysis.md
robertgshaw2-neuralmagic Apr 19, 2023
2d3a89e
Update sentiment-analysis.md
robertgshaw2-neuralmagic Apr 19, 2023
c90cb3e
Update sentiment-analysis.md
robertgshaw2-neuralmagic Apr 19, 2023
a519ba0
Update text-classification.md
robertgshaw2-neuralmagic Apr 19, 2023
417cf3a
Update text-classification.md
robertgshaw2-neuralmagic Apr 19, 2023
b13e28e
Update text-classification.md
robertgshaw2-neuralmagic Apr 19, 2023
f5a535c
Update text-classification.md
robertgshaw2-neuralmagic Apr 19, 2023
bae0836
Update token-classification.md
robertgshaw2-neuralmagic Apr 19, 2023
98ec61e
Update token-classification.md
robertgshaw2-neuralmagic Apr 19, 2023
8d1c257
Update zero-shot-text-classification.md
robertgshaw2-neuralmagic Apr 19, 2023
c3403d5
Update zero-shot-text-classification.md
robertgshaw2-neuralmagic Apr 19, 2023
f86a976
Update zero-shot-text-classification.md
robertgshaw2-neuralmagic Apr 19, 2023
ffc4911
Update transformers-embedding-extraction.md
robertgshaw2-neuralmagic Apr 19, 2023
645fd24
Update embedding-extraction.md
robertgshaw2-neuralmagic Apr 19, 2023
f48b58d
Update image-classification.md
robertgshaw2-neuralmagic Apr 19, 2023
aa59bc6
Update image-segmentation-yolact.md
robertgshaw2-neuralmagic Apr 19, 2023
a62be31
Update object-detection-yolov5.md
robertgshaw2-neuralmagic Apr 19, 2023
cb7d614
Update README.md
robertgshaw2-neuralmagic Apr 19, 2023
352f7e3
Update README.md
robertgshaw2-neuralmagic Apr 19, 2023
d215b47
Update README.md
robertgshaw2-neuralmagic Apr 19, 2023
f365fba
Update README.md
robertgshaw2-neuralmagic Apr 19, 2023
73ec549
Update README.md
robertgshaw2-neuralmagic Apr 19, 2023
08b5c69
Update README.md
robertgshaw2-neuralmagic Apr 19, 2023
4581506
Update README.md
robertgshaw2-neuralmagic Apr 19, 2023
3e73375
Update README.md
robertgshaw2-neuralmagic Apr 19, 2023
c560c09
Update README.md
robertgshaw2-neuralmagic Apr 19, 2023
2435705
Update README.md
robertgshaw2-neuralmagic Apr 19, 2023
8675523
Add files via upload
robertgshaw2-neuralmagic Apr 19, 2023
eef8116
Update README.md
robertgshaw2-neuralmagic Apr 19, 2023
0bc84a8
Update README.md
robertgshaw2-neuralmagic Apr 19, 2023
a31969a
Update README.md
robertgshaw2-neuralmagic Apr 19, 2023
d1fa169
added copyrights
robertgshaw2-neuralmagic Apr 19, 2023
b99fd9f
Merge branch 'main' into rs/docs-update-use-cases
robertgshaw2-neuralmagic Apr 19, 2023
593e130
Update README.md
robertgshaw2-neuralmagic Apr 19, 2023
60b8dbd
Update README.md
robertgshaw2-neuralmagic Apr 19, 2023
9be3f50
Update README.md
robertgshaw2-neuralmagic Apr 19, 2023
7ec9e9e
Update README.md
robertgshaw2-neuralmagic Apr 19, 2023
6be665e
Update README.md
robertgshaw2-neuralmagic Apr 19, 2023
1e85839
Update README.md
robertgshaw2-neuralmagic Apr 19, 2023
517d5d8
Update README.md
robertgshaw2-neuralmagic Apr 19, 2023
a84edb0
Update README.md
robertgshaw2-neuralmagic Apr 19, 2023
89714eb
Update README.md
robertgshaw2-neuralmagic Apr 19, 2023
bc4590b
Update README.md
robertgshaw2-neuralmagic Apr 19, 2023
f8893ed
Update README.md
robertgshaw2-neuralmagic Apr 19, 2023
091e68a
How to use the scheduler across engine, pipeline, server
mwitiderrick Apr 20, 2023
6473677
How to use the scheduler across engine, pipeline, server
mwitiderrick Apr 20, 2023
3bfee5c
Using custom ONNX file with YOLOv5
mwitiderrick Apr 24, 2023
5cafa53
YOLACT ONNX docs
mwitiderrick Apr 24, 2023
8b6ea6d
RestNet ONNX docs
mwitiderrick Apr 24, 2023
610189c
ONNX embedding extraction
mwitiderrick Apr 24, 2023
e02a937
custom ONNX question answering
mwitiderrick Apr 24, 2023
b9a697f
custom ONNX sentiment analysis
mwitiderrick Apr 24, 2023
84b84ad
update sentiment and QA docs
mwitiderrick Apr 24, 2023
fff00b1
update sentiment and QA docs
mwitiderrick Apr 24, 2023
5c917cb
text classification ONNX
mwitiderrick Apr 24, 2023
04a6468
token classification ONNX
mwitiderrick Apr 24, 2023
5974126
transformer embedding extraction ONNX
mwitiderrick Apr 24, 2023
dfaea15
zero shot text classification ONNX
mwitiderrick Apr 24, 2023
fe8c2ca
Add copy right
mwitiderrick Apr 24, 2023
6a05d9b
bucketing docs
mwitiderrick Apr 24, 2023
d5bc36b
update bucketing
mwitiderrick Apr 25, 2023
cbe101e
Download models
mwitiderrick Apr 25, 2023
37c24e4
move ONNX docs
mwitiderrick Apr 25, 2023
b6e2439
update model download section
mwitiderrick Apr 25, 2023
1b73b95
Update qa docs
mwitiderrick Apr 25, 2023
46fd24c
scheduler update
mwitiderrick Apr 25, 2023
1da576b
Merge branch 'main' into rs/docs-update-use-cases
mgoin Apr 25, 2023
c86e65f
Fix merge with main
mgoin Apr 25, 2023
c3f86d8
Merge branch 'main' into rs/docs-update-use-cases
mgoin Sep 1, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions docs/use-cases/cv/embedding-extraction.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,3 +106,31 @@ print(len(result["embeddings"][0][0]))

### Cross Use Case Functionality
Check out the [Server User Guide](../../user-guide/deepsparse-server.md) for more details on configuring the Server.

## Using a Custom ONNX File
Apart from using models from the SparseZoo, DeepSparse allows you to define custom ONNX files for embedding extraction.

The first step is to obtain the ONNX model. You can obtain the file by converting your model to ONNX after training.

Download the [ResNet-50 - ImageNet](https://sparsezoo.neuralmagic.com/models/cv%2Fclassification%2Fresnet_v1-50%2Fpytorch%2Fsparseml%2Fimagenet%2Fpruned95_uniform_quant-none) ONNX model for demonstration:

```bash
sparsezoo.download zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95_uniform_quant-none --save-dir ./embedding-extraction
```
Use the ResNet-50 ONNX model for embedding extraction:
```python
from deepsparse import Pipeline

# this step removes the projection head before compiling the model
rn50_embedding_pipeline = Pipeline.create(
task="embedding-extraction",
base_task="image-classification", # tells the pipeline to expect images and normalize input with ImageNet means/stds
model_path="embedding-extraction/model.onnx",
emb_extraction_layer=-3, # extracts last layer before projection head and softmax
)

# this step runs pre-processing, inference and returns an embedding
embedding = rn50_embedding_pipeline(images="lion.jpeg")
print(len(embedding.embeddings[0][0]))
# 2048
```
25 changes: 25 additions & 0 deletions docs/use-cases/cv/image-classification.md
Original file line number Diff line number Diff line change
Expand Up @@ -259,6 +259,31 @@ resp = requests.post(url=url, files=files)
print(resp.text)
# {"labels":[291,260,244],"scores":[24.185693740844727,18.982254028320312,16.390701293945312]}
```

### Cross Use Case Functionality

Check out the [Server User Guide](../../user-guide/deepsparse-server.md) for more details on configuring the Server.
## Using a Custom ONNX File
Apart from using models from the SparseZoo, DeepSparse allows you to define custom ONNX files when deploying a model.

The first step is to obtain the ONNX model. You can obtain the file by converting your model to ONNX after training.

Download the [ResNet-50 - ImageNet](https://sparsezoo.neuralmagic.com/models/cv%2Fclassification%2Fresnet_v1-50%2Fpytorch%2Fsparseml%2Fimagenet%2Fpruned95_uniform_quant-none) ONNX model for demonstration:
```bash
sparsezoo.download zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95_uniform_quant-none --save-dir ./image_classification
```
Use the ResNet-50 ONNX model for inference:
```python
from deepsparse import Pipeline

# download onnx from sparsezoo and compile with batch size 1
pipeline = Pipeline.create(
task="image_classification",
model_path="image_classification/model.onnx", # sparsezoo stub or path to local ONNX
)

# run inference on image file
prediction = pipeline(images=["lion.jpeg"])
print(prediction.labels)
# [291]
```
26 changes: 26 additions & 0 deletions docs/use-cases/cv/image-segmentation-yolact.md
Original file line number Diff line number Diff line change
Expand Up @@ -224,6 +224,32 @@ resp = requests.post(url=url, files=files)
annotations = json.loads(resp.text) # dictionary of annotation results
boxes, classes, masks, scores = annotations["boxes"], annotations["classes"], annotations["masks"], annotations["scores"]
```

### Cross Use Case Functionality

Check out the [Server User Guide](../../user-guide/deepsparse-server.md) for more details on configuring the Server.

## Using a Custom ONNX File
Apart from using models from the SparseZoo, DeepSparse allows you to define custom ONNX files when deploying a model.

The first step is to obtain the ONNX model. You can obtain the file by converting your model to ONNX after training.

Download on the [YOLCAT](https://sparsezoo.neuralmagic.com/models/cv%2Fsegmentation%2Fyolact-darknet53%2Fpytorch%2Fdbolya%2Fcoco%2Fpruned82_quant-none) ONNX model for demonstration:
```bash
sparsezoo.download zoo:cv/segmentation/yolact-darknet53/pytorch/dbolya/coco/pruned82_quant-none --save-dir ./yolact
```
Use the YOLACT ONNX model for inference:
```python
from deepsparse.pipeline import Pipeline

yolact_pipeline = Pipeline.create(
task="yolact",
model_path="yolact/model.onnx",
)

images = ["thailand.jpeg"]
predictions = yolact_pipeline(images=images)
# predictions has attributes `boxes`, `classes`, `masks` and `scores`
predictions.classes[0]
# [20,20, .......0, 0,24]
```
36 changes: 36 additions & 0 deletions docs/use-cases/cv/object-detection-yolov5.md
Original file line number Diff line number Diff line change
Expand Up @@ -285,3 +285,39 @@ print(labels)
### Cross Use Case Functionality

Check out the [Server User Guide](../../user-guide/deepsparse-server.md) for more details on configuring a Server.
## Using a Custom ONNX File
Apart from using models from the SparseZoo, DeepSparse allows you to define custom ONNX files when deploying a model.

The first step is to obtain the YOLOv5 ONNX model. This could be a YOLOv5 model you have trained and converted to ONNX.
In this case, let's demonstrate by converting a YOLOv5 model to ONNX using the `ultralytics` package:
```python
from ultralytics import YOLO

# Load a model
model = YOLO("yolov5nu.pt") # load a pretrained model
success = model.export(format="onnx") # export the model to ONNX format
```
Download a sample image for detection:
```bash
wget -O basilica.jpg https://github.com/raw/neuralmagic/deepsparse/main/src/deepsparse/yolo/sample_images/basilica.jpg

```
Next, run the DeepSparse object detection pipeline with the custom ONNX file:

```python
from deepsparse import Pipeline

# download onnx from sparsezoo and compile with batch size 1
yolo_pipeline = Pipeline.create(
task="yolo",
model_path="yolov5nu.onnx", # sparsezoo stub or path to local ONNX
)
images = ["basilica.jpg"]

# run inference on image file
pipeline_outputs = yolo_pipeline(images=images)
print(pipeline_outputs.boxes)
print(pipeline_outputs.labels)
# [[[-0.8809833526611328, 5.1244752407073975, 27.885415077209473, 57.20366072654724], [-9.014896631240845, -2.4366320967674255, 21.488688468933105, 37.2245477437973], [14.241515636444092, 11.096746131777763, 30.164274215698242, 22.02291651070118], [7.107024908065796, 5.017698150128126, 15.09239387512207, 10.45704211294651]]]
# [['8367.0', '1274.0', '8192.0', '6344.0']]
```
135 changes: 135 additions & 0 deletions docs/use-cases/general/bucketing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
<!--
Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# How to Use Bucketing With DeepSparse
DeepSparse supports bucketing to lower latency and increase the throughput of deep learning pipelines. Bucketing sequences of different sizes increases inference speed.

Input lengths in NLP problems can vary. We usually select a maximum length where sentences longer that the maximum length are truncated and shorter ones are padded to reach the maximum length. This solution can be inefficient for real-world applications leading to more memory utilization.

Bucketing is a solution that places sequences of varying lengths in different buckets. It is more efficient because it reduces the amount of padding required.

In this document, we will explore how to use bucketing with DeepSparse.

## How Bucketing Works in DeepSparse
DeepSparse handles bucketing natively to reduce the time you would otherwise spend building this preprocessing pipeline. Bucketing with DeepSparse leads to a performance boost compared to a pipeline without bucketing. When buckets are provided, DeepSparse will create different models for the provided input sizes.

For example, if your input data length ranges from 157 to 4063, with 700 being the median and you are using a model like BERT, whose maximum token length is 512, you can use these input shapes [256,320,384,448, 512]. This means that all tokens shorter than 256 will be padded to 256, while any tokens longer than 512 will be truncated to 512. Tokens longer than 256 will be padded to 320, and so on.

At inference, each input is sent to the corresponding bucketed model. In this case, you’d have 5 models because you have defined 5 buckets. Bucketing reduces the amount of compute because you are no longer padding all the sequences to the maximum length in the dataset. You can decide on the bucket sizes by examining the distribution of the dataset and experimenting with different sizes. The best choice is the one that covers all the inputs in the range of the dataset.

## Bucketing NLP Models with DeepSparse
DeepSparse makes it easy to set up bucketing. You pass the desired bucket sizes, and DeepSparse will automatically set up the buckets. You can determine the optimal size of the buckets by analyzing the lengths of the input data and selecting buckets where most of the data lies.

For example, here's the distribution of the [wnut_17](https://huggingface.co/datasets/wnut_17) dataset:
![image](images/wnut.png)
Visualizing the data distribution enables you to choose the best bucket sizes to use.

Define a token classification pipeline that uses no buckets, later you will compare it performance with one that uses buckets. The `deployment` folder contains the model configuration files for a token classification model obtained by:
```bash
sparsezoo.download zoo:nlp/token_classification/bert-large/pytorch/huggingface/conll2003/base-none --save-dir ./dense-model
```
The folder contains:
- `config.json`
- `model.onnx`
- `tokenizer.json`

```python
from deepsparse import Pipeline
import deepsparse.transformers
from datasets import load_dataset
from transformers import AutoTokenizer
from tqdm import tqdm
import time

def run(model_path, batch_size, buckets):
### SETUP DATASETS - in this case, we download WNUT_17
print("Setting up the dataset:")

INPUT_COL = "sentences"
dataset = load_dataset("wnut_17", split="train")
sentences = []
for sentence in dataset["tokens"]:
string = ""
for elt in sentence:
string += elt
string += " "
sentences.append(string)
dataset = dataset.add_column(INPUT_COL, sentences)

### TOKENIZE DATASET - (used to comptue buckets)
tokenizer = AutoTokenizer.from_pretrained(model_path)
def pre_process_fn(examples):
return tokenizer(examples[INPUT_COL], add_special_tokens=True, return_tensors="np",padding=False,truncation=False)

dataset = dataset.map(pre_process_fn, batched=True)
dataset = dataset.add_column("num_tokens", list(map(len, dataset["input_ids"])))
dataset = dataset.sort("num_tokens")
max_token_len = dataset[-1]["num_tokens"]

### SPLIT DATA INTO BATCHES
num_pad_items = batch_size - (dataset.num_rows % batch_size)
inputs = ([""] * num_pad_items) + dataset[INPUT_COL]
batches = []
for b_index_start in range(0, len(inputs), batch_size):
batches.append(inputs[b_index_start:b_index_start+batch_size])

### RUN THROUPUT TESTING
print("\nCompiling models:")

# compile model with buckets
buckets.append(max_token_len)
ds_pipeline = Pipeline.create(
"token_classification",
model_path=model_path,
batch_size=batch_size,
sequence_length=buckets,
)

print("\nRunning test:")

# run inferences on the dataset
start = time.perf_counter()

predictions = []
for batch in tqdm(batches):
predictions.append(ds_pipeline(batch))

# flatten and remove padded predictions
predictions = [pred for sublist in predictions for pred in sublist.predictions]
predictions = predictions[num_pad_items:]
end = time.perf_counter()

# compute throughput
total_time_executing = (end - start) * 1000.0
items_per_sec = len(predictions) / total_time_executing

print(f"Items Per Second: {items_per_sec}")
print(f"Program took: {total_time_executing} ms")
return predictions

predictions = run("token_classification", 64, [])
# Items Per Second: 0.0060998544593741395
# Program took: 556406.7179970443 ms
```

Run the same script with varying input lengths:
```python
batch_size = 64
buckets = [15,35,55,75]
predictions = run("token_classification", batch_size, buckets)
# Items Per Second: 0.01046572543802951
# Program took: 324296.67872493155 ms
```
The pipeline using buckets achieves 1.7 more items per second compared to the one without.
Binary file added docs/use-cases/general/images/wnut.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading