Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Productionize Chat demo #1235

Merged
merged 4 commits into from
Sep 25, 2023
Merged

Productionize Chat demo #1235

merged 4 commits into from
Sep 25, 2023

Conversation

rahul-tuli
Copy link
Member

@rahul-tuli rahul-tuli commented Sep 11, 2023

This PR is a productionized version of #1229 originally written by @dbogunowicz;
this is essentially the same code with:

  • Arguments migrated to click
  • Add tokens/sec info
  • Add some more arguments to control pipeline creation

Note: streaming mode will be added as a part of a separate PR

Example usage:

python examples/chatbot_llm/main.py /home/rahul/.cache/sparsezoo/neuralmagic/codegen_m
ono-350m-bigpython_bigquery_thepile-base/deployment --show_tokens_per_sec

Output:

 python examples/chatbot-llm/chatbot.py \                                                                                        (chat_demo_prod|…10⚑3)
    zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none --show_tokens_per_sec
2023-09-25 09:17:03 deepsparse.transformers WARNING  The neuralmagic fork of transformers may not be installed. It can be installed via `pip install nm_transformers`
Using pad_token, but it is not set yet.
2023-09-25 09:17:15 deepsparse.transformers.pipelines.text_generation INFO     Compiling an auxiliary engine to process a prompt with a larger processing length. This improves performance, but may result in additional memory consumption.
2023-09-25 09:17:16 deepsparse.utils.onnx INFO     Overwriting in-place the input shapes of the transformer model at /home/rahul/.cache/sparsezoo/neuralmagic/codegen_mono-350m-bigpython_bigquery_thepile-base/model.onnx
DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.6.0.20230910 COMMUNITY | (c66b57da) (release) (optimized) (system=avx512, binary=avx512)
2023-09-25 09:17:25 deepsparse.utils.onnx INFO     Overwriting in-place the input shapes of the transformer model at /home/rahul/.cache/sparsezoo/neuralmagic/codegen_mono-350m-bigpython_bigquery_thepile-base/model.onnx
User: def fib(
2023-09-25 09:17:46 deepsparse.transformers.utils.helpers INFO      No GenerationConfig detected. Using GenerationDefaults values
Bot:  n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fib(n-1) + fib(n-2)

def fib2(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fib2(n-1) + fib2(n-2)

def fib3(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fib3(n-1) + fib3(n-2)

def fib4(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fib4(n-1) + fib4(n-2)

def fib5(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fib5(n-1) + fib5(n-2)

def fib6(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fib6(n-1) + fib6(n-2)

def fib7(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fib7(n-1) + fib7(n-2)

def fib8(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fib8(n-1) + fib8(n-2)

def fib9(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fib9(n-1) + fib9(n-2)

def fib10(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fib10(n-1) + fib10(n-2)

def fib11(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fib11(n-1) + fib11(n-2)

def fib12(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fib12(n-1) + fib12(n-2)

def fib13(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fib13(n-1) + fib13(n-2)

def fib14(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fib14(n-1) + fib14(n-2)

def fib15(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fib15(n-1) + fib15(n-2)

def fib16(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fib16(n-1) + fib16(n-2)

def fib17(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fib17(n-1) + fib17(n-2)

def fib18(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fib18(n-1) + fib18(n-2)

def fib19(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fib19(n-1) + fib19(n-2)

def fib20(n):
    if n == 0:
        return 0
    elif n ==
[prefill: 1437.73 tokens/sec]
[decode: 31.21 tokens/sec]
User: ^C
Aborted!

Copy link
Contributor

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • test with different models (llama, mpt)
  • add prompt_length to the arguments
  • add a docstring showing a chat example
  • streaming mode (optional scenario)
  • display tokens per second; optional (default to False) --> timer manager

@rahul-tuli
Copy link
Member Author

Noting will add streaming mode as a part of another PR all other comments have been addressed!

@rahul-tuli rahul-tuli marked this pull request as ready for review September 11, 2023 18:39
@rahul-tuli rahul-tuli self-assigned this Sep 11, 2023
@dbogunowicz dbogunowicz mentioned this pull request Sep 12, 2023
Copy link
Contributor

@dbogunowicz dbogunowicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a short demo output in the PR description?

examples/chatbot_llm/main.py Outdated Show resolved Hide resolved
examples/chatbot_llm/main.py Outdated Show resolved Hide resolved
examples/chatbot_llm/main.py Outdated Show resolved Hide resolved
examples/chatbot_llm/main.py Outdated Show resolved Hide resolved
dsikka
dsikka previously approved these changes Sep 13, 2023
Copy link
Contributor

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall; two comments:

  • nit: name main.py is super generic
  • a docstring of an expected output might be helpful

Satrat
Satrat previously approved these changes Sep 13, 2023
bfineran
bfineran previously approved these changes Sep 13, 2023
Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is chatbot.py supposed to replace main.py - looks like they are duplicated. Also I assume using the session_id keeps the previous history to feed in each time? Can there be an option to not keep history and instead just feed in the current prompt?

Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think an __init__.py is needed for an example and it would be nice to have a small README.md - nice changes!

examples/chatbot_llm/chatbot.py Outdated Show resolved Hide resolved
mgoin
mgoin previously approved these changes Sep 18, 2023
Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@rahul-tuli rahul-tuli changed the base branch from feature/damian/chat_demo to main September 25, 2023 13:29
@rahul-tuli rahul-tuli dismissed stale reviews from mgoin, bfineran, Satrat, and dsikka September 25, 2023 13:29

The base branch was changed.

dbogunowicz
dbogunowicz previously approved these changes Sep 25, 2023
examples/chatbot-llm/chatbot.py Outdated Show resolved Hide resolved
Copy link
Member

@bfineran bfineran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for initial land - let's get a persistent session id added in

Copy link
Member

@bfineran bfineran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's rename this to and infer.py and move it to source. also see comment on history

examples/chatbot-llm/chatbot.py Outdated Show resolved Hide resolved
Update fire condition
Add cli callable
@bfineran bfineran merged commit c3b313c into main Sep 25, 2023
13 checks passed
@bfineran bfineran deleted the chat_demo_prod branch September 25, 2023 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants