[LLAMA] KV Cache Injection #1709

dsikka · 2023-08-21T21:02:26Z

LLAMA2 in general requires the most up-to-date nm-transformers

Summary

Add support for LLAMA2 KV Cache Injection:

Add a new config for the LLM in configs.py
Transforms added for positions and causal mask use the same set of transforms required/used by the other LLMs we currently support
One additional transform is added to update the slice nodes in the attention heads such that the ends attribute is updated. This is required for the positions injection to work properly

Testing:

import onnx
from sparseml.exporters.kv_cache_injector import KeyValueCacheInjector
from pathlib import Path
import numpy as np
import onnxruntime
import numpy 

INJECT_NEW = True

MODEL_NAME = "deployment_sparseml/model.onnx"
root = Path("/home/dsikka/llama_run/deployment_sparseml")

if INJECT_NEW:
    model = onnx.load(str(root / "model_og.onnx"), load_external_data=True)
    model = KeyValueCacheInjector(str(root)).apply(model)
    onnx.save(model, MODEL_NAME, all_tensors_to_one_file=True, save_as_external_data=True)

    try:
        onnx.checker.check_model(MODEL_NAME)
    except onnx.checker.ValidationError as e:
        print(e)
    else:
        print("Valid")

With this injected model, we can currently run the model in the pipeline using ORT:

from deepsparse import Pipeline

llama = Pipeline.create(
   task="text-generation",
   model_path="/home/dsikka/llama_run/deployment_sparseml",
   engine_type="onnxruntime"
)

inference = llama(sequences="Who is your favourite Toronto Raptor?")
print(inference)

Output:

sequences=["\n\nI'm a big fan of Kyle Lowry, he's a great point guard and leader on the team. But I also really enjoy watching Pascal Siakam, he's a talented young player with a lot of potential. And of course, I can't forget about Kawhi Leonard, he's a incredible player and a key part of the team's success."] logits=None session_id=None

dbogunowicz · 2023-08-22T10:33:43Z

This looks great @dsikka . We need to figure out the fix for the positions and then we are good to go!

src/sparseml/exporters/transforms/kv_cache/transforms_llama.py

src/sparseml/exporters/transforms/kv_cache/transforms_base.py

bfineran

looks great - see comment

src/sparseml/exporters/transforms/kv_cache/transforms_llama.py

…s level

llama kv cache update

c4c5d2c

add llama transform to update slice nodes during kv cache injection

9a494f8

dsikka requested review from dbogunowicz and bfineran August 22, 2023 17:49

Merge branch 'main' into llama_update

7307604

dsikka requested review from Satrat and rahul-tuli August 22, 2023 20:16

dbogunowicz reviewed Aug 23, 2023

View reviewed changes

src/sparseml/exporters/transforms/kv_cache/transforms_llama.py Outdated Show resolved Hide resolved

dbogunowicz reviewed Aug 23, 2023

View reviewed changes

move adjust_causal_masks, update docstring with additional details

4766f3c

dsikka requested a review from dbogunowicz August 24, 2023 16:33

dbogunowicz reviewed Aug 24, 2023

View reviewed changes

src/sparseml/exporters/transforms/kv_cache/transforms_base.py Outdated Show resolved Hide resolved

dsikka requested a review from dbogunowicz August 25, 2023 15:54

move back causal mask

66a0298

bfineran reviewed Aug 25, 2023

View reviewed changes

src/sparseml/exporters/transforms/kv_cache/transforms_llama.py Show resolved Hide resolved

src/sparseml/exporters/transforms/kv_cache/transforms_llama.py Outdated Show resolved Hide resolved

dbogunowicz previously approved these changes Aug 28, 2023

View reviewed changes

update pattern to identify correct slice nodes; move constant to clas…

51e2196

…s level

dsikka dismissed dbogunowicz’s stale review via 51e2196 August 29, 2023 03:13

dsikka requested review from dbogunowicz and bfineran August 29, 2023 03:15

Merge branch 'main' into llama_update

fe479a6

dbogunowicz approved these changes Aug 29, 2023

View reviewed changes

bfineran approved these changes Aug 29, 2023

View reviewed changes

dsikka merged commit df570d1 into main Aug 29, 2023
10 checks passed

dsikka deleted the llama_update branch August 29, 2023 15:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLAMA] KV Cache Injection #1709

[LLAMA] KV Cache Injection #1709

dsikka commented Aug 21, 2023 •

edited

Loading

dbogunowicz commented Aug 22, 2023

bfineran left a comment

[LLAMA] KV Cache Injection #1709

[LLAMA] KV Cache Injection #1709

Conversation

dsikka commented Aug 21, 2023 • edited Loading

Summary

Testing:

dbogunowicz commented Aug 22, 2023

bfineran left a comment

Choose a reason for hiding this comment

dsikka commented Aug 21, 2023 •

edited

Loading