Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
mgoin authored and mwitiderrick committed Nov 29, 2023
1 parent 7ad2c2d commit c6aceea
Showing 1 changed file with 13 additions and 1 deletion.
14 changes: 13 additions & 1 deletion src/sparseml/transformers/sparsification/obcq/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,19 @@ Injecting KV Cache is done to reduce the model’s computational overhead and sp
This is done by creating a copy of `model.onnx` and injecting the KV Cache:
```bash
cp deployment/model.onnx deployment/model-orig.onnx
python onnx_kv_inject.py --input-file deployment/model-orig.onnx --output-file deployment/model.onnx
```

Code to inject KV Cache:
```python
import os
import onnx
from sparseml.exporters.kv_cache_injector import KeyValueCacheInjector
input_file = "deployment/model-orig.onnx"
output_file = "deployment/model.onnx"
model = onnx.load(input_file, load_external_data=False)
model = KeyValueCacheInjector(model_path=os.path.dirname(input_file)).apply(model)
onnx.save(model, output_file)
print(f"Modified model saved to: {output_file}")
```

## <a name="deepsparse">Using the Model With DeepSparse </a>
Expand Down

0 comments on commit c6aceea

Please sign in to comment.