Skip to content

Commit

Permalink
Benchmark Script for Pipelines (#1150)
Browse files Browse the repository at this point in the history
* WIP pipeline benchmark script

* simple script

* share code and cleanup

* adding additional cmd line arguments

* image and text inputs

* json export of statistics

* clean up printed output

* adding support for real data

* support for additional pipelines

* expanding input schemas, allowing for kwargs

* README, quality, additional args

* moving code around, update README

* adding unit tests

* adding missing test file

* skipping test w/high memory usage

* skip test with high memory usage

* unit test memory

* add tests back in

* add tests back in

* fix async percentages

* fix new quality errors

* pass num_streams, fix percentage calculation for async

* fix for file loading

* PR comments

* PR comments

* BaseModel for pipeline config

* quality fix

* fix broken test

* cleanup code, replace argpase with click

* Update README with example output

* support for multiple timers, adding docstrings

* docstrings

* add text generation example to README

* clean up timermanager usage

* PR comments

* style

* PR comments
  • Loading branch information
Satrat committed Aug 17, 2023
1 parent 0b93039 commit 545348b
Show file tree
Hide file tree
Showing 11 changed files with 1,329 additions and 74 deletions.
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -301,6 +301,7 @@ def _setup_entry_points() -> Dict:
"deepsparse.analyze=deepsparse.analyze:main",
"deepsparse.check_hardware=deepsparse.cpu:print_hardware_capability",
"deepsparse.benchmark=deepsparse.benchmark.benchmark_model:main",
"deepsparse.benchmark_pipeline=deepsparse.benchmark.benchmark_pipeline:main", # noqa E501
"deepsparse.benchmark_sweep=deepsparse.benchmark.benchmark_sweep:main",
"deepsparse.server=deepsparse.server.cli:main",
"deepsparse.object_detection.annotate=deepsparse.yolo.annotate:main",
Expand Down
155 changes: 155 additions & 0 deletions src/deepsparse/benchmark/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,4 +186,159 @@ Latency Mean (ms/batch): 16.0732
Latency Median (ms/batch): 15.7850
Latency Std (ms/batch): 1.0427
Iterations: 622
```

## 📜 Benchmarking Pipelines
Expanding on the model benchmarking script, `deepsparse.benchmark_pipeline` is a tool for benchmarking end-to-end inference, including pre and post processing. The script can generate fake input data based on the pipeline's input schema, or load it from a local folder. The pipeline then runs pre-processing, engine inference and post-processing. Benchmarking results are reported by section, useful for identifying bottlenecks.

### Usage
Input arguments are the same as the Engine benchmarker, but with two additions:

```
positional arguments:
task_name Type of pipeline to run(i.e "text_generation")
optional arguments:
-c INPUT_CONFIG, --input_config INPUT_CONFIG
JSON file containing schema for input data
```

The `input_config` argument is a path to a json file specifying details on the input schema to the pipeline, detailed below.

### Configuring Pipeline Inputs

Inputs to the pipeline are configured through a json config file. The `data_type` field should be set to `"dummy"` if passing randomly generated data through the pipeline, and `"real"` if passing in data from files.

#### Dummy Input Configuration
An example dummy input configuration is shown below.
* `gen_sequence_length`: number of characters to generate for pipelines that take text input
* `input_image_shape`: configures image size for pipelines that take image input, must be 3 dimmensional with channel as the last dimmension

```json
{
"data_type": "dummy",
"gen_sequence_length": 100,
"input_image_shape": [500,500,3],
"pipeline_kwargs": {},
"input_schema_kwargs": {}
}
```

#### Real Input Configuration
An example real input configuration is shown below.
* `data_folder`: path to local folder of input data, should contain text or image files
* `recursive_search`: whether to recursively search through `data_folder` for files
* `max_string_length`: maximum characters to read from each file containing text data, -1 for no max length

```json
{
"data_type": "real",
"data_folder": "/home/sadkins/imagenette2-320/",
"recursive_search": true,
"max_string_length": -1,
"pipeline_kwargs": {},
"input_schema_kwargs": {}
}
```

#### Keyword Arguments
Additional arguments to the pipeline or input_schema can be added to the `pipeline_kwargs` and `input_schema_kwargs` fields respectively. For instance, to pass class_names to a YOLO pipeline and conf_thres to the input schema
```json
{
"data_type": "dummy",
"input_image_shape": [500,500,3],
"pipeline_kwargs": {
"class_names": ["classA", "classB"]
},
"input_schema_kwargs": {
"conf_thres": 0.7
}
}
```

### Example Usage

Running ResNet image classification for 30 seconds with a batch size of 32:
```
deepsparse.benchmark_pipeline image_classification zoo:cv/classification/resnet_v1-50_2x/pytorch/sparseml/imagenet/base-none -c config.json -t 60 -b 32
```

Running CodeGen text generation for 30 seconds asynchronously
```
deepsparse.benchmark_pipeline text_generation zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/pruned50-none -c config.json -t 30 -s async
```
### Example Output
Command:
```
deepsparse.benchmark_pipeline text_classification zoo:nlp/sentiment_analysis/distilbert-none/pytorch/huggingface/sst2/pruned90-none -c config.json
```
config.json:
```json
{
"data_type": "real",
"gen_sequence_length": 1000,
"data_folder": "/home/sadkins/text_data/",
"recursive_search": true,
"max_string_length": -1
}
```

Output:
```
Batch Size: 1
Scenario: sync
Iterations: 955
Total Runtime: 10.0090
Throughput (items/sec): 95.4137
Processing Time Breakdown:
total_inference: 99.49%
pre_process: 25.70%
engine_forward: 72.56%
post_process: 1.03%
Mean Latency Breakdown (ms/batch):
total_inference: 10.4274
pre_process: 2.6938
engine_forward: 7.6051
post_process: 0.1077
```

Command:
```
deepsparse.benchmark_pipeline text_generation zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base_quant-none -c config.json -t 60
```
config.json:
```json
{
"data_type": "dummy",
"gen_sequence_length": 100,
"pipeline_kwargs": {},
"input_schema_kwargs": {}
}
```

Output:
```
Batch Size: 1
Scenario: sync
Iterations: 6
Total Runtime: 62.8005
Throughput (items/sec): 0.0955
Processing Time Breakdown:
total_inference: 100.00%
pre_process: 0.00%
engine_forward: 99.98%
post_process: 0.01%
engine_prompt_prefill: 5.83%
engine_prompt_prefill_single: 0.09%
engine_token_generation: 93.64%
engine_token_generation_single: 0.09%
Mean Latency Breakdown (ms/batch):
total_inference: 20932.4786
pre_process: 0.9729
engine_forward: 20930.2190
post_process: 1.2150
engine_prompt_prefill: 1220.7037
engine_prompt_prefill_single: 19.0412
engine_token_generation: 19603.0353
engine_token_generation_single: 19.1170
```
81 changes: 7 additions & 74 deletions src/deepsparse/benchmark/benchmark_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,10 +95,15 @@
import importlib
import json
import logging
import os
from typing import Dict

from deepsparse import Scheduler, __version__, compile_model
from deepsparse import __version__, compile_model
from deepsparse.benchmark.helpers import (
decide_thread_pinning,
parse_num_streams,
parse_scenario,
parse_scheduler,
)
from deepsparse.benchmark.ort_engine import ORTEngine
from deepsparse.benchmark.stream_benchmark import model_stream_benchmark
from deepsparse.cpu import cpu_architecture
Expand Down Expand Up @@ -241,78 +246,6 @@ def parse_args():
return parser.parse_args()


def decide_thread_pinning(pinning_mode: str) -> None:
pinning_mode = pinning_mode.lower()
if pinning_mode in "core":
os.environ["NM_BIND_THREADS_TO_CORES"] = "1"
_LOGGER.info("Thread pinning to cores enabled")
elif pinning_mode in "numa":
os.environ["NM_BIND_THREADS_TO_CORES"] = "0"
os.environ["NM_BIND_THREADS_TO_SOCKETS"] = "1"
_LOGGER.info("Thread pinning to socket/numa nodes enabled")
elif pinning_mode in "none":
os.environ["NM_BIND_THREADS_TO_CORES"] = "0"
os.environ["NM_BIND_THREADS_TO_SOCKETS"] = "0"
_LOGGER.info("Thread pinning disabled, performance may be sub-optimal")
else:
_LOGGER.info(
"Recieved invalid option for thread_pinning '{}', skipping".format(
pinning_mode
)
)


def parse_scheduler(scenario: str) -> Scheduler:
scenario = scenario.lower()
if scenario == "multistream":
return Scheduler.multi_stream
elif scenario == "singlestream":
return Scheduler.single_stream
elif scenario == "elastic":
return Scheduler.elastic
else:
return Scheduler.multi_stream


def parse_scenario(scenario: str) -> str:
scenario = scenario.lower()
if scenario == "async":
return "multistream"
elif scenario == "sync":
return "singlestream"
elif scenario == "elastic":
return "elastic"
else:
_LOGGER.info(
"Recieved invalid option for scenario'{}', defaulting to async".format(
scenario
)
)
return "multistream"


def parse_num_streams(num_streams: int, num_cores: int, scenario: str):
# If model.num_streams is set, and the scenario is either "multi_stream" or
# "elastic", use the value of num_streams given to us by the model, otherwise
# use a semi-sane default value.
if scenario == "sync" or scenario == "singlestream":
if num_streams and num_streams > 1:
_LOGGER.info("num_streams reduced to 1 for singlestream scenario.")
return 1
else:
if num_streams:
return num_streams
else:
default_num_streams = max(1, int(num_cores / 2))
_LOGGER.info(
"num_streams default value chosen of {}. "
"This requires tuning and may be sub-optimal".format(
default_num_streams
)
)
return default_num_streams


def load_custom_engine(custom_engine_identifier: str):
"""
import a custom engine based off the specified `custom_engine_identifier`
Expand Down
Loading

0 comments on commit 545348b

Please sign in to comment.