Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[server] Refactor + OpenAI Chat Completion Support #1288

Merged
merged 9 commits into from
Oct 10, 2023

Conversation

dsikka
Copy link
Contributor

@dsikka dsikka commented Sep 28, 2023

Summary

Couple of notes about the current integration:

  1. The /v1/chat/completion endpoint is supported (and also now the/v1/models endpoint as a result of the merging in: [server] Update OpenAI Model Support #1300)
  2. The ChatCompletionRequest is what is expected by the endpoints. Not all the properties supported by the pipeline are supported by the request and vice versa. For example, the request does not seem to have a way to set do_sample whereas our pipeline does not currently support "logit_bias", "best_ok", "ignore_eos", "use_beam_search". We map attributes from the ChatCompletiionRequest to the text generation pipeline using the following mapping:
OPENAI_TO_DEEPSPARSE_MAPPINGS = {
    "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none" 
    "max_tokens": "max_length",
    "frequency_penalty": "repetition_penalty",
}
  1. As a result of merging in [server] Update OpenAI Model Support #1300, all the models added during server start-up are available for inference using the /v1/chat/completions endpoint

Unit Testing

  • All tests have been updated and are passing

Testing and Examples - Existing Server

  • We can use the server through the same workflow that was previously used:

Sample Config:

num_cores: 2
num_workers: 2
endpoints:
  - task: text_generation
    model: zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none
  - task: question_answering
    model: zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/12layer_pruned80_quant-none-vnni

Launch a server:
deepsparse.server --config_file sample_config.yaml

Requests can be sent how they were sent previously:

import requests 

url = "http://localhost:5543/v2/models/question_answering-0/infer"
obj = {
    "question": "Who is Mark?", 
    "context": "Mark is batman."
}

response = requests.post(url, json=obj)
print(response.text)

url = "http://localhost:5543/v2/models/text_generation-0/infer"
obj = {
    "prompt": "Hey there!"
}

response = requests.post(url, json=obj)
print(response.text)

Testing and Examples - OpenAI

  • OpenAI can be used through the integration input, similar to how sagemaker was used before:
    deepsparse.server --config_file sample_config.yaml --integration openai. There is also a dedicated workflow, through the deepsparse.openai sample_config.yaml command
  • Similar to before the refactor, multiple models can be hosted on multiple routes for the Deepsparse server and Sagemaker integration. However for OpenAI, only the task task_generation (and its aliases) are supported. Example server config is shown below.
num_cores: 2
num_workers: 2
endpoints:
  - task: text_generation
    model: zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none
  • The command and config above create the following routes:
Screenshot 2023-10-09 at 2 34 27 PM
  • Requests can be made through the requests library, curl commands or through the OpenAI API
  • All OpenAI requests must comply with the ChatCompletionRequest given by OpenAI, and all the parameters are mapped to our text_generation pipeline

Curl Commands:

curl http://localhost:5543/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
        "messages": "your favourite book?",
        "max_tokens": 30,
        "n": 2,
    }'

Output:

{"id":"cmpl-1bde341534bf4bd9a6f65133dd975516","object":"chat.completion","created":1696876569,"model":"zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none","choices":[{"index":0,"message":{"role":"assistant","content":"\n\nI’m a big fan of the Hitchhiker’s Guide to the Skies. I’m also a big fan"},"finish_reason":"length"}],"usage":{"prompt_tokens":2,"total_tokens":4,"completion_tokens":2}}

OpenAI API


import openai


openai.api_key = "EMPTY"
openai.api_base = "http://localhost:5543/v1"

# Completion API
stream = False
completion = openai.ChatCompletion.create(
    messages="how's your day going?",
    stream=stream,
    max_tokens=30,
    model="zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
)

print("Chat results:")
if stream:
    text = ""
    for c in completion:
        print(c)
else:
    print(completion)
  • Without streaming
Chat results:
{
  "id": "cmpl-d960c87b055c4c75ba01c448e2cfa49e",
  "object": "chat.completion",
  "created": 1696876680,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "\nI'm doing pretty well. I'm going to be going to the gym tomorrow. I'm going to be going to the gym tomorrow. I"
      },
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 2,
    "total_tokens": 4,
    "completion_tokens": 2
  }
}

  • With streaming (set stream flag to True)
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": "assistant"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": "I"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": "'m"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " doing"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " pretty"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " well"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": "."
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " I"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": "'m"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " going"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " to"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " be"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " going"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " to"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " the"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " gym"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " tomorrow"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": "."
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " I"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": "'m"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " going"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " to"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " be"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " going"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " to"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " the"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " gym"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " tomorrow"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": "."
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " I"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": ""
      },
      "finish_reason": "length"
    }
  ]
}

…for chat completion streaming and non streaming
@dsikka dsikka changed the base branch from main to match_mlserver September 28, 2023 22:06
@dsikka dsikka marked this pull request as ready for review September 29, 2023 21:48
@dsikka dsikka requested review from Satrat, mgoin, bfineran, dbogunowicz and rahul-tuli and removed request for Satrat October 2, 2023 19:04
Copy link
Member

@bfineran bfineran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great - especially feeling good that we were able to migrate the existing tests

src/deepsparse/server/cli.py Outdated Show resolved Hide resolved
src/deepsparse/server/deepsparse_server.py Show resolved Hide resolved
src/deepsparse/server/openai_server.py Outdated Show resolved Hide resolved
src/deepsparse/server/openai_server.py Outdated Show resolved Hide resolved
src/deepsparse/server/openai_server.py Outdated Show resolved Hide resolved
src/deepsparse/server/openai_server.py Show resolved Hide resolved
src/deepsparse/server/output.py Show resolved Hide resolved
src/deepsparse/server/protocol.py Show resolved Hide resolved
src/deepsparse/server/sagemaker.py Show resolved Hide resolved
* update server

* allow users to send requests with new models

* use v1; move around baseroutes

* add openai path

* PR comments
@dsikka dsikka requested a review from bfineran October 9, 2023 19:55
Copy link
Member

@rahul-tuli rahul-tuli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, clean code! one small nit is that there are places where Google docstring format is used, whereas we prefer rst in the rest of our codebase; but not a blocker from my side!

@mgoin
Copy link
Member

mgoin commented Oct 10, 2023

@dsikka towards the point of OpenAI not having a do_sample argument, we should set do_sample=True when any of the sampling parameters are provided i.e. top_k, top_p. OpenAI's equivalent of do_sample=False is simply temperature=0

@dsikka dsikka merged commit d99a82c into match_mlserver Oct 10, 2023
@dsikka dsikka deleted the openai_support branch October 10, 2023 13:50
raise ValueError(
f"{integration} is not a supported integration. Must be "
"one of local, sagemkaer or openai."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: typo, also maybe worth defining a list of supported integrations as a global constant? Then we can just print out the list of supported integrations instead of having this as a hardcoded comment

Comment on lines +48 to +51
OPENAI_TO_DEEPSPARSE_MAPPINGS = {
"max_tokens": "max_length",
"frequency_penalty": "repetition_penalty",
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is there a reason that we don't just use the openai namings?

Comment on lines +21 to +31
"""The output data of one completion output of a request.

Args:
index: The index of the output in the request.
text: The generated output text.
token_ids: The token IDs of the generated output text.
cumulative_logprob: The cumulative log probability of the generated
output text.
logprobs: The log probabilities of the top probability words at each
position if the logprobs are requested.
finish_reason: The reason why the sequence is finished.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another nit but this is a different docstring format than I've seen in other parts of our codebase. Would expect something like this:

"""
blah blah blah

:param param1:
:param param2:
"""

Comment on lines +57 to +65
# For deepsparse endpoints, we bind the `predict`` and `predict_from_files` functions to
# each of the added routes. As we require access to the pipeline to run inference on
# each request, instead of binding `predict`, we bind `partial(predict, pipeline ...)`
# so that there is access to the pipelne when handling each request. However, fastapi
# has trouble with validating the pipeline type. As a workaround, we can wrap each
# pipelinne as a `ProxyPipeline` which can be resolved by fastapi.
class ProxyPipeline:
def __init__(self, pipeline: Pipeline):
self.pipeline = pipeline
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice :)

dsikka added a commit that referenced this pull request Oct 11, 2023
* update/clean-up server to match mlserver docs

* update server tests

* add back ping

* [server] Refactor + OpenAI Chat Completion Support (#1288)

* refactor server for different integrations; additional functionality for chat completion streaming and non streaming

* further refactor server

* add support such that openai can host multiple models

* update all tests

* fix output for n > 1

* add inline comment explaining ProxyPipeline

* [server] Update OpenAI Model Support (#1300)

* update server

* allow users to send requests with new models

* use v1; move around baseroutes

* add openai path

* PR comments

* clean-up output classes to be dataclasses, add docstrings, cleanup generation kwargs

* update readme, update route cleaning, update docstring

* fix README for QA
dsikka added a commit that referenced this pull request Oct 11, 2023
* update/clean-up server to match mlserver docs

* update server tests

* add back ping

* [server] Refactor + OpenAI Chat Completion Support (#1288)

* refactor server for different integrations; additional functionality for chat completion streaming and non streaming

* further refactor server

* add support such that openai can host multiple models

* update all tests

* fix output for n > 1

* add inline comment explaining ProxyPipeline

* [server] Update OpenAI Model Support (#1300)

* update server

* allow users to send requests with new models

* use v1; move around baseroutes

* add openai path

* PR comments

* clean-up output classes to be dataclasses, add docstrings, cleanup generation kwargs

* update readme, update route cleaning, update docstring

* fix README for QA

* add openai doc

* update docs

* Update src/deepsparse/server/openai.md

Co-authored-by: Domenic Barbuzzi <domenic@neuralmagic.com>

---------

Co-authored-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants