Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Dead SparseZoo Stubs in Documentation #1279

Merged
merged 4 commits into from
Sep 26, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 13 additions & 13 deletions examples/benchmark/resnet50_benchmark.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,52 +123,52 @@ def main():
results = benchmark_model(
(
"zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/"
"pruned-conservative"
"pruned80_quant-none-vnni"
),
sample_inputs,
batch_size=batch_size,
num_cores=num_cores,
num_iterations=num_iterations,
num_warmup_iterations=num_warmup_iterations,
)
print(f"ResNet-50 v1 Pruned Conservative FP32 {results}")
print(f"ResNet-50 v1 Pruned 80 INT8 {results}")

if not VNNI:
print(
"WARNING: VNNI instructions not detected, "
"quantization (INT8) speedup not well supported"
)

results = benchmark_model(
(
"zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/"
"pruned-moderate"
"pruned90-none"
),
sample_inputs,
batch_size=batch_size,
num_cores=num_cores,
num_iterations=num_iterations,
num_warmup_iterations=num_warmup_iterations,
)
print(f"ResNet-50 v1 Pruned Moderate FP32 {results}")

if not VNNI:
print(
"WARNING: VNNI instructions not detected, "
"quantization (INT8) speedup not well supported"
)
print(f"ResNet-50 v1 Pruned 90 FP32 {results}")

results = benchmark_model(
(
"zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/"
"pruned_quant-moderate"
"pruned90_quant-none"
),
sample_inputs,
batch_size=batch_size,
num_cores=num_cores,
num_iterations=num_iterations,
num_warmup_iterations=num_warmup_iterations,
)
print(f"ResNet-50 v1 Pruned Moderate INT8 {results}")
print(f"ResNet-50 v1 Pruned 90 INT8 {results}")

results = benchmark_model(
(
"zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/"
"pruned95_quant-none"
"pruned95_uniform_quant-none"
),
sample_inputs,
batch_size=batch_size,
Expand Down
12 changes: 6 additions & 6 deletions src/deepsparse/benchmark/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,10 @@ limitations under the License.

### Quickstart

After `pip install deepsparse`, the benchmark tool is available on your CLI. For example, to benchmark a dense BERT ONNX model fine-tuned on the SST2 dataset where the model path is the minimum input required to get started, run:
After `pip install deepsparse`, the benchmark tool is available on your CLI. For example, to benchmark a dense BERT ONNX model fine-tuned on the MNLI dataset where the model path is the minimum input required to get started, run:

```
deepsparse.benchmark zoo:nlp/text_classification/bert-base/pytorch/huggingface/sst2/base-none
deepsparse.benchmark zoo:nlp/text_classification/bert-base/pytorch/huggingface/mnli/base-none
```
__ __
### Usage
Expand Down Expand Up @@ -94,7 +94,7 @@ optional arguments:
Example CLI command for benchmarking an ONNX model from the SparseZoo and saving the results to a `benchmark.json` file:

```
deepsparse.benchmark zoo:nlp/text_classification/bert-base/pytorch/huggingface/sst2/base-none -x benchmark.json
deepsparse.benchmark zoo:nlp/text_classification/bert-base/pytorch/huggingface/mnli/base-none -x benchmark.json
```
Output of the JSON file:

Expand All @@ -108,10 +108,10 @@ To run a sparse FP32 MobileNetV1 at batch size 16 for 10 seconds for throughput
deepsparse.benchmark zoo:cv/classification/mobilenet_v1-1.0/pytorch/sparseml/imagenet/pruned-moderate --batch_size 16 --time 10 --scenario async --num_streams 8
```

To run a sparse quantized INT8 6-layer BERT at batch size 1 for latency:
To run a sparse quantized INT8 BERT at batch size 1 for latency:

```
deepsparse.benchmark zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned_quant_6layers-aggressive_96 --batch_size 1 --scenario sync
deepsparse.benchmark zoo:nlp/question_answering/bert-large/pytorch/huggingface/squad/pruned90_quant-none --batch_size 1 --scenario sync
```
__ __
### ⚡ Inference Scenarios
Expand Down Expand Up @@ -341,4 +341,4 @@ Mean Latency Breakdown (ms/batch):
engine_prompt_prefill_single: 19.0412
engine_token_generation: 19603.0353
engine_token_generation_single: 19.1170
```
```
6 changes: 3 additions & 3 deletions src/deepsparse/transformers/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Hugging Face Transformer Inference Pipelines
x# Hugging Face Transformer Inference Pipelines
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a stray “x” got in here and breaks the headline formatting.



DeepSparse allows accelerated inference, serving, and benchmarking of sparsified [Hugging Face Transformer](https://github.com/huggingface/transformers) models.
Expand Down Expand Up @@ -208,7 +208,7 @@ Spinning up:
```bash
deepsparse.server \
task sentiment-analysis \
--model_path "zoo:nlp/sentiment_analysis/bert-base/pytorch/huggingface/sst2/12layer_pruned80_quant-none-vnni"
--model_path "zoo:nlp/sentiment_analysis/bert-base/pytorch/huggingface/sst2/pruned80_quant-none-vnni"
```

Making a request:
Expand Down Expand Up @@ -314,7 +314,7 @@ Spinning up:
```bash
deepsparse.server \
task token-classification \
--model_path "zoo:nlp/token_classification/bert-base/pytorch/huggingface/conll2003/12layer_pruned80_quant-none-vnni"
--model_path "zoo:nlp/token_classification/bert-base/pytorch/huggingface/conll2003/pruned90-none"
```

Making a request:
Expand Down
Loading