[TextGeneration] Update pipeline inputs to support GenerationConfig #1250

dsikka · 2023-09-17T19:48:53Z

Summary:

Update the input to the pipeline to support a transformers.GenerationConfig. This will essentially be used in replacement of many of the inputs that we provide as separate fields (such as num_return_sequences, top_k, etc).
Supports path to a json with the config file, dictionary, or transformers.GenerationConfig object
The user can provide the config either on the pipeline level or the input level. If an input level config is provided, it will override the pipeline level config and be used for generation. Otherwise, the one provided during pipeline creation will be used. If neither are given, defaults set in the GenerationDefaults class will be used
If the user provides a generation config, either on the pipeline level or on the input level but not all the values are set, then the defaults given by GenerationConfig will be used for the missing values, not the GenerationDefaults class.

Test Cases

Dictionary:

from deepsparse import Pipeline

pipeline = Pipeline.create(
   task="text_generation",
   model_path="/home/dsikka/.cache/sparsezoo/neuralmagic/opt-1.3b-opt_pretrain-quantW8A8/deployment",
   engine_type="onnxruntime"
)
generation_config = {
   "num_return_sequences": 2,
   "max_length": 100

}
inference = pipeline(sequences=["hello?", "cool"], generation_config=generation_config)
print(next(inference))

string or Path

from deepsparse import Pipeline

pipeline = Pipeline.create(
   task="text_generation",
   model_path="/home/dsikka/.cache/sparsezoo/neuralmagic/opt-1.3b-opt_pretrain-quantW8A8/deployment",
   engine_type="onnxruntime"
)
generation_config_path = "/home/dsikka/llama_run/current_config.json"
inference = pipeline(sequences=["hello?", "cool"], generation_config=generation_config_path)
print(next(inference))

`GenerationConfig` object

from deepsparse import Pipeline
from pathlib import Path
from transformers import GenerationConfig

pipeline = Pipeline.create(
   task="text_generation",
   model_path="/home/dsikka/.cache/sparsezoo/neuralmagic/opt-1.3b-opt_pretrain-quantW8A8/deployment",
   engine_type="onnxruntime"
)

generation_config_obj = GenerationConfig(
   num_return_sequences=3,
   max_length=100,
   output_scores=True
)
inference = pipeline(sequences=["hello?", "cool"], generation_config=generation_config_obj)
print(next(inference))

`None` - no generation config is provided, will use the `GenerationDefaults` instead

from deepsparse import Pipeline
from pathlib import Path
from transformers import GenerationConfig

pipeline = Pipeline.create(
   task="text_generation",
   model_path="/home/dsikka/.cache/sparsezoo/neuralmagic/opt-1.3b-opt_pretrain-quantW8A8/deployment",
   engine_type="onnxruntime"
)

inference = pipeline(sequences=["hello?", "cool"])
for out in inference:
   print(out)
   print("\n")

Set `GenerationConfig` on the pipeline level

The config set will be used for each pipeline input used

from deepsparse import Pipeline
from pathlib import Path
from transformers import GenerationConfig

generation_config_obj = GenerationConfig(
   num_return_sequences=3,
   max_length=100,
)

pipeline = Pipeline.create(
   task="text_generation",
   model_path="/home/dsikka/.cache/sparsezoo/neuralmagic/opt-1.3b-opt_pretrain-quantW8A8/deployment",
   engine_type="onnxruntime",
   generation_config=generation_config_obj
)

inference = pipeline(sequences=["hello?"])
for out in inference:
   print(out)
   print("\n")

inference = pipeline(sequences=["cool"])
for out in inference:
   print(out)
   print("\n")

Output 3 text generations per prompt, as set by the pipeline-level config

2023-09-19 11:31:14 deepsparse.transformers.pipelines.text_generation INFO     Generation config provided for pipline. This will be used for all inputs unless and input-specific config is provided. 
created=datetime.datetime(2023, 9, 19, 11, 32, 43, 387168) prompts=['hello?'] generations=[[GeneratedText(text='\n\nI am a new member of the forum and I am looking for some help. I am a new member of the forum and I am looking for some help. I am a new member of the forum and I am looking for some help. I am a new member of the forum and I am looking for some help. I am a new member of the forum and I am looking for some help. I am a new member of the forum and I am looking for some help. I am', score=None, finished=True, finished_reason='length'), GeneratedText(text='\n\nI am a new member of the forum and I am looking for some help. I am a new member of the forum and I am looking for some help. I am a new member of the forum and I am looking for some help. I am a new member of the forum and I am looking for some help. I am a new member of the forum and I am looking for some help. I am a new member of the forum and I am looking for some help. I am', score=None, finished=True, finished_reason='length'), GeneratedText(text='\n\nI am a new member of the forum and I am looking for some help. I am a new member of the forum and I am looking for some help. I am a new member of the forum and I am looking for some help. I am a new member of the forum and I am looking for some help. I am a new member of the forum and I am looking for some help. I am a new member of the forum and I am looking for some help. I am', score=None, finished=True, finished_reason='length')]] session_id=None


created=datetime.datetime(2023, 9, 19, 11, 32, 54, 414539) prompts=['cool'] generations=[[GeneratedText(text=", i'll be there.\nI'll be there too.", score=None, finished=True, finished_reason='stop'), GeneratedText(text=", i'll be there.\nI'll be there too.", score=None, finished=True, finished_reason='stop'), GeneratedText(text=", i'll be there.\nI'll be there too.", score=None, finished=True, finished_reason='stop')]] session_id=None

Set `GenerationConfig` on the pipeline level, override on the input level

from deepsparse import Pipeline
from pathlib import Path
from transformers import GenerationConfig

generation_config_obj = GenerationConfig(
   num_return_sequences=3,
   max_length=100,
)

pipeline = Pipeline.create(
   task="text_generation",
   model_path="/home/dsikka/.cache/sparsezoo/neuralmagic/opt-1.3b-opt_pretrain-quantW8A8/deployment",
   engine_type="onnxruntime",
   generation_config=generation_config_obj
)

generation_config_obj_input = GenerationConfig(
   num_return_sequences=2,
   max_length=50,
)


inference = pipeline(sequences=["hello?"], generation_config=generation_config_obj_input)
for out in inference:
   print(out)
   print("\n")

inference = pipeline(sequences=["cool"])
for out in inference:
   print(out)
   print("\n")

Output:
For the first prompt, the pipeline config is overwritten with the config given with the input. This results in 2 text generations for the first input. For the second input, as not config is given, the pipeline config is used, resulting in 3 text generations.

created=datetime.datetime(2023, 9, 19, 11, 37, 5, 159853) prompts=['hello?'] generations=[[GeneratedText(text='\n\nI am a new member of the forum and I am looking for some help. I am a new member of the forum and I am looking for some help. I am a new member of the forum and I am looking for some help.', score=None, finished=True, finished_reason='length'), GeneratedText(text='\n\nI am a new member of the forum and I am looking for some help. I am a new member of the forum and I am looking for some help. I am a new member of the forum and I am looking for some help.', score=None, finished=True, finished_reason='length')]] session_id=None


created=datetime.datetime(2023, 9, 19, 11, 37, 17, 76668) prompts=['cool'] generations=[[GeneratedText(text=", i'll be there.\nI'll be there too.", score=None, finished=True, finished_reason='stop'), GeneratedText(text=", i'll be there.\nI'll be there too.", score=None, finished=True, finished_reason='stop'), GeneratedText(text=", i'll be there.\nI'll be there too.", score=None, finished=True, finished_reason='stop')]] session_id=None

src/deepsparse/transformers/pipelines/text_generation.py

bfineran

LGTM - can we check in any unit tests for this?

src/deepsparse/transformers/pipelines/text_generation.py

The base branch was changed.

src/deepsparse/transformers/pipelines/text_generation.py

src/deepsparse/transformers/utils/helpers.py

… overriding

dsikka force-pushed the enable_streaming branch from b610646 to a69abf9 Compare September 18, 2023 16:13

dsikka force-pushed the update_inputs branch from ac68490 to 506c687 Compare September 18, 2023 20:43

dsikka force-pushed the enable_streaming branch from 627b849 to ce85c4b Compare September 19, 2023 14:26

dsikka force-pushed the update_inputs branch from a4b6d93 to 6127737 Compare September 19, 2023 14:47

dsikka marked this pull request as ready for review September 19, 2023 15:41

dsikka requested review from bfineran, dbogunowicz, Satrat, rahul-tuli and mgoin September 19, 2023 15:48

bfineran requested changes Sep 19, 2023

View reviewed changes

dbogunowicz reviewed Sep 21, 2023

View reviewed changes

src/deepsparse/transformers/pipelines/text_generation.py Show resolved Hide resolved

src/deepsparse/transformers/pipelines/text_generation.py Outdated Show resolved Hide resolved

dsikka force-pushed the update_inputs branch from 93c012d to e0cb484 Compare September 21, 2023 17:46

bfineran previously approved these changes Sep 21, 2023

View reviewed changes

src/deepsparse/transformers/pipelines/text_generation.py Show resolved Hide resolved

Satrat previously approved these changes Sep 21, 2023

View reviewed changes

dsikka force-pushed the enable_streaming branch from 1dc2bd1 to 3a18e5f Compare September 21, 2023 18:29

Base automatically changed from enable_streaming to main September 21, 2023 20:11

dsikka force-pushed the update_inputs branch from e0cb484 to 8c5798a Compare September 21, 2023 20:38

bfineran previously approved these changes Sep 21, 2023

View reviewed changes

mgoin reviewed Sep 22, 2023

View reviewed changes

dbogunowicz reviewed Sep 22, 2023

View reviewed changes

src/deepsparse/transformers/utils/helpers.py Show resolved Hide resolved

dsikka added 7 commits September 22, 2023 10:21

add streaming functionality

c1f88fa

set back default value

9310974

update pipeline.py

53b1f0e

update tests

abf2a23

fix tests

6379b72

update pipeline to use kwargs

a5d264e

add TODO statements

2823d58

dsikka added 11 commits September 22, 2023 10:22

add streaming functionality

7151b65

Update pipeline inputs to support GenerationConfig

84ca4fc

add max_new_tokens

f03c1fc

remove todo

c43b105

update post local test runs

da79fcd

remove todo missed from rebase

1587e2a

refactor to use helpers, update reference to generation config variables

e9a8f63

update helper functions to include all generation config handling and…

cf1e468

… overriding

fix tests

fe6454a

update to work with new session commit

af3a835

update to use config

b7f333f

dsikka dismissed bfineran’s stale review via b7f333f September 22, 2023 14:22

dsikka force-pushed the update_inputs branch from f8f5448 to b7f333f Compare September 22, 2023 14:22

cleanup

0ea6e91

bfineran approved these changes Sep 22, 2023

View reviewed changes

dbogunowicz approved these changes Sep 22, 2023

View reviewed changes

bfineran merged commit b309fa4 into main Sep 22, 2023
13 checks passed

bfineran deleted the update_inputs branch September 22, 2023 14:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TextGeneration] Update pipeline inputs to support GenerationConfig #1250

[TextGeneration] Update pipeline inputs to support GenerationConfig #1250

dsikka commented Sep 17, 2023 •

edited by mgoin

Loading

bfineran left a comment

[TextGeneration] Update pipeline inputs to support GenerationConfig #1250

[TextGeneration] Update pipeline inputs to support GenerationConfig #1250

Conversation

dsikka commented Sep 17, 2023 • edited by mgoin Loading

Summary:

Test Cases

Dictionary:

string or Path

GenerationConfig object

None - no generation config is provided, will use the GenerationDefaults instead

Set GenerationConfig on the pipeline level

Set GenerationConfig on the pipeline level, override on the input level

bfineran left a comment

Choose a reason for hiding this comment

dsikka commented Sep 17, 2023 •

edited by mgoin

Loading

`GenerationConfig` object

`None` - no generation config is provided, will use the `GenerationDefaults` instead

Set `GenerationConfig` on the pipeline level

Set `GenerationConfig` on the pipeline level, override on the input level