Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark Script for Pipelines #1150

Merged
merged 54 commits into from
Aug 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
bff0e03
WIP pipeline benchmark script
Satrat Jul 21, 2023
e26eaa7
simple script
Satrat Jul 24, 2023
218db5f
Merge branch 'main' into pipeline-benchmark
Satrat Jul 24, 2023
7732296
share code and cleanup
Satrat Jul 24, 2023
956dbe8
adding additional cmd line arguments
Satrat Jul 25, 2023
6cbc99e
image and text inputs
Satrat Jul 25, 2023
0143d31
json export of statistics
Satrat Jul 25, 2023
58edf05
clean up printed output
Satrat Jul 25, 2023
75bda3a
adding support for real data
Satrat Jul 25, 2023
b751e75
support for additional pipelines
Satrat Jul 27, 2023
76a5af9
expanding input schemas, allowing for kwargs
Satrat Jul 27, 2023
6cb6bef
README, quality, additional args
Satrat Jul 28, 2023
75f5173
moving code around, update README
Satrat Jul 28, 2023
6148962
Merge branch 'main' into pipeline-benchmark
Satrat Jul 28, 2023
9202a6f
adding unit tests
Satrat Jul 28, 2023
5c94bb1
Merge branch 'main' into pipeline-benchmark
Satrat Jul 28, 2023
2ed0185
adding missing test file
Satrat Jul 28, 2023
729447e
skipping test w/high memory usage
Satrat Jul 31, 2023
abb4811
skip test with high memory usage
Satrat Jul 31, 2023
8cdbe9b
unit test memory
Satrat Jul 31, 2023
1058f0b
add tests back in
Satrat Jul 31, 2023
249e645
add tests back in
Satrat Jul 31, 2023
ba8688b
fix async percentages
Satrat Jul 31, 2023
ecf1559
fix new quality errors
Satrat Jul 31, 2023
a25e4b6
Merge branch 'fix-quality-check' into pipeline-benchmark
Satrat Jul 31, 2023
d4e80dd
Merge branch 'main' into pipeline-benchmark
Satrat Jul 31, 2023
e0f6ab3
pass num_streams, fix percentage calculation for async
Satrat Aug 1, 2023
a9ae5d0
Merge branch 'pipeline-benchmark' of github.com:neuralmagic/deepspars…
Satrat Aug 1, 2023
bc1d3e2
Merge branch 'main' into pipeline-benchmark
Satrat Aug 1, 2023
9473b79
fix for file loading
Satrat Aug 1, 2023
c48d2e3
Merge branch 'pipeline-benchmark' of github.com:neuralmagic/deepspars…
Satrat Aug 1, 2023
cc8de6a
PR comments
Satrat Aug 1, 2023
b5ec9ae
PR comments
Satrat Aug 1, 2023
99b4051
BaseModel for pipeline config
Satrat Aug 1, 2023
67d8187
quality fix
Satrat Aug 1, 2023
8b9768e
fix broken test
Satrat Aug 2, 2023
50d5a74
cleanup code, replace argpase with click
Satrat Aug 2, 2023
70f7440
Update README with example output
Satrat Aug 4, 2023
4c0396b
Merge branch 'main' into pipeline-benchmark
Satrat Aug 4, 2023
9c398ca
Merge branch 'main' into pipeline-benchmark
Satrat Aug 8, 2023
3afeec7
support for multiple timers, adding docstrings
Satrat Aug 9, 2023
86bb3d5
Merge branch 'pipeline-benchmark' of github.com:neuralmagic/deepspars…
Satrat Aug 9, 2023
2dfcac2
Merge branch 'main' into pipeline-benchmark
Satrat Aug 9, 2023
df9a3f7
docstrings
Satrat Aug 9, 2023
d8238bc
Merge branch 'pipeline-benchmark' of github.com:neuralmagic/deepspars…
Satrat Aug 9, 2023
b0bc840
add text generation example to README
Satrat Aug 10, 2023
eba70d6
clean up timermanager usage
Satrat Aug 10, 2023
e9b2367
Merge branch 'main' into pipeline-benchmark
Satrat Aug 10, 2023
b5fb5b5
Merge branch 'main' into pipeline-benchmark
Satrat Aug 14, 2023
1eb3202
PR comments
Satrat Aug 15, 2023
289f545
style
Satrat Aug 15, 2023
749a752
PR comments
Satrat Aug 15, 2023
427e9c0
Merge branch 'main' into pipeline-benchmark
Satrat Aug 15, 2023
6264961
Merge branch 'main' into pipeline-benchmark
Satrat Aug 17, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -301,6 +301,7 @@ def _setup_entry_points() -> Dict:
"deepsparse.analyze=deepsparse.analyze:main",
"deepsparse.check_hardware=deepsparse.cpu:print_hardware_capability",
"deepsparse.benchmark=deepsparse.benchmark.benchmark_model:main",
"deepsparse.benchmark_pipeline=deepsparse.benchmark.benchmark_pipeline:main", # noqa E501
"deepsparse.benchmark_sweep=deepsparse.benchmark.benchmark_sweep:main",
"deepsparse.server=deepsparse.server.cli:main",
"deepsparse.object_detection.annotate=deepsparse.yolo.annotate:main",
Expand Down
155 changes: 155 additions & 0 deletions src/deepsparse/benchmark/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,4 +186,159 @@ Latency Mean (ms/batch): 16.0732
Latency Median (ms/batch): 15.7850
Latency Std (ms/batch): 1.0427
Iterations: 622
```

## 📜 Benchmarking Pipelines
Expanding on the model benchmarking script, `deepsparse.benchmark_pipeline` is a tool for benchmarking end-to-end inference, including pre and post processing. The script can generate fake input data based on the pipeline's input schema, or load it from a local folder. The pipeline then runs pre-processing, engine inference and post-processing. Benchmarking results are reported by section, useful for identifying bottlenecks.

### Usage
Input arguments are the same as the Engine benchmarker, but with two additions:

```
positional arguments:
task_name Type of pipeline to run(i.e "text_generation")

optional arguments:
-c INPUT_CONFIG, --input_config INPUT_CONFIG
JSON file containing schema for input data
```

The `input_config` argument is a path to a json file specifying details on the input schema to the pipeline, detailed below.

### Configuring Pipeline Inputs

Inputs to the pipeline are configured through a json config file. The `data_type` field should be set to `"dummy"` if passing randomly generated data through the pipeline, and `"real"` if passing in data from files.

#### Dummy Input Configuration
An example dummy input configuration is shown below.
* `gen_sequence_length`: number of characters to generate for pipelines that take text input
* `input_image_shape`: configures image size for pipelines that take image input, must be 3 dimmensional with channel as the last dimmension

```json
{
"data_type": "dummy",
"gen_sequence_length": 100,
"input_image_shape": [500,500,3],
"pipeline_kwargs": {},
"input_schema_kwargs": {}
}
```

#### Real Input Configuration
An example real input configuration is shown below.
* `data_folder`: path to local folder of input data, should contain text or image files
* `recursive_search`: whether to recursively search through `data_folder` for files
* `max_string_length`: maximum characters to read from each file containing text data, -1 for no max length

```json
{
"data_type": "real",
"data_folder": "/home/sadkins/imagenette2-320/",
"recursive_search": true,
"max_string_length": -1,
"pipeline_kwargs": {},
"input_schema_kwargs": {}
}
```

#### Keyword Arguments
Additional arguments to the pipeline or input_schema can be added to the `pipeline_kwargs` and `input_schema_kwargs` fields respectively. For instance, to pass class_names to a YOLO pipeline and conf_thres to the input schema
```json
{
"data_type": "dummy",
"input_image_shape": [500,500,3],
"pipeline_kwargs": {
"class_names": ["classA", "classB"]
},
"input_schema_kwargs": {
"conf_thres": 0.7
}
}
```

### Example Usage

Running ResNet image classification for 30 seconds with a batch size of 32:
```
deepsparse.benchmark_pipeline image_classification zoo:cv/classification/resnet_v1-50_2x/pytorch/sparseml/imagenet/base-none -c config.json -t 60 -b 32
```

Running CodeGen text generation for 30 seconds asynchronously
```
deepsparse.benchmark_pipeline text_generation zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/pruned50-none -c config.json -t 30 -s async
```
### Example Output
Command:
```
deepsparse.benchmark_pipeline text_classification zoo:nlp/sentiment_analysis/distilbert-none/pytorch/huggingface/sst2/pruned90-none -c config.json
```
config.json:
```json
{
"data_type": "real",
"gen_sequence_length": 1000,
"data_folder": "/home/sadkins/text_data/",
"recursive_search": true,
"max_string_length": -1
}
```

Output:
```
Batch Size: 1
Scenario: sync
Iterations: 955
Total Runtime: 10.0090
Throughput (items/sec): 95.4137
Processing Time Breakdown:
total_inference: 99.49%
pre_process: 25.70%
engine_forward: 72.56%
post_process: 1.03%
Mean Latency Breakdown (ms/batch):
total_inference: 10.4274
pre_process: 2.6938
engine_forward: 7.6051
post_process: 0.1077
```

Command:
```
deepsparse.benchmark_pipeline text_generation zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base_quant-none -c config.json -t 60
```
config.json:
```json
{
"data_type": "dummy",
"gen_sequence_length": 100,
"pipeline_kwargs": {},
"input_schema_kwargs": {}
}
```

Output:
```
Batch Size: 1
Scenario: sync
Iterations: 6
Total Runtime: 62.8005
Throughput (items/sec): 0.0955
Processing Time Breakdown:
total_inference: 100.00%
pre_process: 0.00%
engine_forward: 99.98%
post_process: 0.01%
engine_prompt_prefill: 5.83%
engine_prompt_prefill_single: 0.09%
engine_token_generation: 93.64%
engine_token_generation_single: 0.09%
Mean Latency Breakdown (ms/batch):
total_inference: 20932.4786
pre_process: 0.9729
engine_forward: 20930.2190
post_process: 1.2150
engine_prompt_prefill: 1220.7037
engine_prompt_prefill_single: 19.0412
engine_token_generation: 19603.0353
engine_token_generation_single: 19.1170
```
81 changes: 7 additions & 74 deletions src/deepsparse/benchmark/benchmark_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,10 +95,15 @@
import importlib
import json
import logging
import os
from typing import Dict

from deepsparse import Scheduler, __version__, compile_model
from deepsparse import __version__, compile_model
from deepsparse.benchmark.helpers import (
decide_thread_pinning,
parse_num_streams,
parse_scenario,
parse_scheduler,
)
from deepsparse.benchmark.ort_engine import ORTEngine
from deepsparse.benchmark.stream_benchmark import model_stream_benchmark
from deepsparse.cpu import cpu_architecture
Expand Down Expand Up @@ -241,78 +246,6 @@ def parse_args():
return parser.parse_args()


def decide_thread_pinning(pinning_mode: str) -> None:
pinning_mode = pinning_mode.lower()
if pinning_mode in "core":
os.environ["NM_BIND_THREADS_TO_CORES"] = "1"
_LOGGER.info("Thread pinning to cores enabled")
elif pinning_mode in "numa":
os.environ["NM_BIND_THREADS_TO_CORES"] = "0"
os.environ["NM_BIND_THREADS_TO_SOCKETS"] = "1"
_LOGGER.info("Thread pinning to socket/numa nodes enabled")
elif pinning_mode in "none":
os.environ["NM_BIND_THREADS_TO_CORES"] = "0"
os.environ["NM_BIND_THREADS_TO_SOCKETS"] = "0"
_LOGGER.info("Thread pinning disabled, performance may be sub-optimal")
else:
_LOGGER.info(
"Recieved invalid option for thread_pinning '{}', skipping".format(
pinning_mode
)
)


def parse_scheduler(scenario: str) -> Scheduler:
scenario = scenario.lower()
if scenario == "multistream":
return Scheduler.multi_stream
elif scenario == "singlestream":
return Scheduler.single_stream
elif scenario == "elastic":
return Scheduler.elastic
else:
return Scheduler.multi_stream


def parse_scenario(scenario: str) -> str:
scenario = scenario.lower()
if scenario == "async":
return "multistream"
elif scenario == "sync":
return "singlestream"
elif scenario == "elastic":
return "elastic"
else:
_LOGGER.info(
"Recieved invalid option for scenario'{}', defaulting to async".format(
scenario
)
)
return "multistream"


def parse_num_streams(num_streams: int, num_cores: int, scenario: str):
# If model.num_streams is set, and the scenario is either "multi_stream" or
# "elastic", use the value of num_streams given to us by the model, otherwise
# use a semi-sane default value.
if scenario == "sync" or scenario == "singlestream":
if num_streams and num_streams > 1:
_LOGGER.info("num_streams reduced to 1 for singlestream scenario.")
return 1
else:
if num_streams:
return num_streams
else:
default_num_streams = max(1, int(num_cores / 2))
_LOGGER.info(
"num_streams default value chosen of {}. "
"This requires tuning and may be sub-optimal".format(
default_num_streams
)
)
return default_num_streams


def load_custom_engine(custom_engine_identifier: str):
"""
import a custom engine based off the specified `custom_engine_identifier`
Expand Down
Loading
Loading