Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[server] Update OpenAI endpoints #1445

Merged
merged 15 commits into from
Dec 13, 2023
Merged

[server] Update OpenAI endpoints #1445

merged 15 commits into from
Dec 13, 2023

Conversation

dsikka
Copy link
Contributor

@dsikka dsikka commented Nov 30, 2023

Summary

  • Add the /v1/completions endpoint
  • Update /v1/chat/completions to accept/handle FastChat-compliant dictionaries

Testing

  • Using OpenAI API:
from openai import OpenAI


client = OpenAI(base_url="http://localhost:5543/v1", api_key="EMPTY")

models = client.models.list()

model = "hf:neuralmagic/mpt-7b-chat-pruned50-quant"
print(f"Accessing model API '{model}'")


# Completion API
stream = True
completion = client.chat.completions.create(
    messages={"role": "user", "content": "Talk about the Toronto Raptors."},
    stream=stream,
    max_tokens=100,
    model=model,
)

print("Chat results:")
if stream:
    text = ""
    for c in completion:
        print(c)
else:
    print(completion)


stream = True
completion = client.completions.create(
    prompt="How are you today?",
    stream=stream,
    max_tokens=100,
    model=model,
)

print("Completion results:")
if stream:
    text = ""
    for c in completion:
        print(c)
else:
    print(completion)

@dsikka dsikka marked this pull request as ready for review December 1, 2023 01:28
dbogunowicz
dbogunowicz previously approved these changes Dec 1, 2023
src/deepsparse/server/openai_server.py Outdated Show resolved Hide resolved
src/deepsparse/server/openai_server.py Outdated Show resolved Hide resolved
src/deepsparse/server/openai_server.py Outdated Show resolved Hide resolved
src/deepsparse/server/openai_server.py Show resolved Hide resolved
src/deepsparse/server/protocol.py Show resolved Hide resolved
@mgoin
Copy link
Member

mgoin commented Dec 6, 2023

I ran through the script using hf:neuralmagic/TinyLlama-1.1B-Chat-v0.4-pruned50-quant-ds as the model and installing pip install fschat accelerate

Looks like something about the last message handshake went wrong.

client.txt

ChatCompletionChunk(id='cmpl-c735b32f15c043b49893cd6a0ac7ab96', choices=[Choice(delta=ChoiceDelta(content='', function_call=None, role=None, tool_calls=None), finish_reason='length', index=0)], created=1701898636, model='hf:neuralmagic/TinyLlama-1.1B-Chat-v0.4-pruned50-quant-ds', object='chat.completion.chunk', system_fingerprint=None)
httpx.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)

server.txt

  File "/Users/mgoin/code/deepsparse/src/deepsparse/server/openai_server.py", line 159, in abort_request
    await pipeline.abort(request_id)
AttributeError: 'TextGenerationPipeline' object has no attribute 'abort'

@dsikka
Copy link
Contributor Author

dsikka commented Dec 7, 2023

I ran through the script using hf:neuralmagic/TinyLlama-1.1B-Chat-v0.4-pruned50-quant-ds as the model and installing pip install fschat accelerate

Looks like something about the last message handshake went wrong.

client.txt

ChatCompletionChunk(id='cmpl-c735b32f15c043b49893cd6a0ac7ab96', choices=[Choice(delta=ChoiceDelta(content='', function_call=None, role=None, tool_calls=None), finish_reason='length', index=0)], created=1701898636, model='hf:neuralmagic/TinyLlama-1.1B-Chat-v0.4-pruned50-quant-ds', object='chat.completion.chunk', system_fingerprint=None)
httpx.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)

server.txt

  File "/Users/mgoin/code/deepsparse/src/deepsparse/server/openai_server.py", line 159, in abort_request
    await pipeline.abort(request_id)
AttributeError: 'TextGenerationPipeline' object has no attribute 'abort'

What script did you use? The example script in the PR description? That seems to work for me. If you send me your code/example, I can investigate.

src/deepsparse/server/openai_server.py Outdated Show resolved Hide resolved
src/deepsparse/server/openai_server.py Outdated Show resolved Hide resolved
src/deepsparse/server/openai_server.py Show resolved Hide resolved
src/deepsparse/server/protocol.py Outdated Show resolved Hide resolved
src/deepsparse/server/protocol.py Show resolved Hide resolved
src/deepsparse/server/protocol.py Show resolved Hide resolved
src/deepsparse/server/openai_server.py Outdated Show resolved Hide resolved
src/deepsparse/server/openai_server.py Outdated Show resolved Hide resolved
src/deepsparse/server/openai_server.py Outdated Show resolved Hide resolved
src/deepsparse/server/openai_server.py Show resolved Hide resolved
rahul-tuli
rahul-tuli previously approved these changes Dec 12, 2023
Copy link
Member

@rahul-tuli rahul-tuli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@dsikka dsikka merged commit 3b09d2f into main Dec 13, 2023
13 checks passed
@dsikka dsikka deleted the server_update branch December 13, 2023 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants