[Text Generation] Detect dtype of kv cache (float32/uint8) for text generation models #1123

dbogunowicz · 2023-07-17T14:02:50Z

The NLDecoderEngine can now infer the dtype of the kv cache input from the onnx graph. This is necessary in order to enforce the adequate dtype when creating an initial kv cache arrays.

The PR is complementary to neuralmagic/sparseml#1648. Refer to that PR for manual tests description.

bfineran · 2023-07-18T13:24:20Z

let's add test plan to the description

dbogunowicz · 2023-07-18T13:27:40Z

@bfineran but the appropriate tests are laid out in detail in the sparseml counterpart.

src/deepsparse/transformers/engines/nl_decoder_engine.py

dbogunowicz added 2 commits July 17, 2023 11:08

initial implementation

b319fa3

initial commit

5da22c6

dbogunowicz requested review from bfineran and dsikka July 17, 2023 14:03

dbogunowicz assigned rahul-tuli Jul 17, 2023

dbogunowicz requested a review from Satrat July 17, 2023 14:03

bfineran approved these changes Jul 17, 2023

View reviewed changes

dbogunowicz mentioned this pull request Jul 18, 2023

[Fix] Fix the KV Cache insertion logic for quantized OPT neuralmagic/sparseml#1648

Merged

dbogunowicz assigned dbogunowicz and unassigned rahul-tuli Jul 18, 2023

Merge branch 'main' into feature/damian/enable_inference_w_quant_models

844339b

Satrat reviewed Jul 18, 2023

View reviewed changes

src/deepsparse/transformers/engines/nl_decoder_engine.py Show resolved Hide resolved

Satrat reviewed Jul 18, 2023

View reviewed changes

src/deepsparse/transformers/engines/nl_decoder_engine.py Show resolved Hide resolved

Satrat approved these changes Jul 18, 2023

View reviewed changes

dbogunowicz merged commit ad998df into main Jul 18, 2023
7 checks passed

dbogunowicz deleted the feature/damian/enable_inference_w_quant_models branch July 18, 2023 13:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Text Generation] Detect dtype of kv cache (float32/uint8) for text generation models #1123

[Text Generation] Detect dtype of kv cache (float32/uint8) for text generation models #1123

dbogunowicz commented Jul 17, 2023 •

edited

Loading

bfineran commented Jul 18, 2023

dbogunowicz commented Jul 18, 2023

[Text Generation] Detect dtype of kv cache (float32/uint8) for text generation models #1123

[Text Generation] Detect dtype of kv cache (float32/uint8) for text generation models #1123

Conversation

dbogunowicz commented Jul 17, 2023 • edited Loading

bfineran commented Jul 18, 2023

dbogunowicz commented Jul 18, 2023

dbogunowicz commented Jul 17, 2023 •

edited

Loading