[server] Refactor + OpenAI Chat Completion Support #1288

dsikka · 2023-09-28T22:06:00Z

Summary

Couple of notes about the current integration:

The /v1/chat/completion endpoint is supported (and also now the/v1/models endpoint as a result of the merging in: [server] Update OpenAI Model Support #1300)
The ChatCompletionRequest is what is expected by the endpoints. Not all the properties supported by the pipeline are supported by the request and vice versa. For example, the request does not seem to have a way to set do_sample whereas our pipeline does not currently support "logit_bias", "best_ok", "ignore_eos", "use_beam_search". We map attributes from the ChatCompletiionRequest to the text generation pipeline using the following mapping:

OPENAI_TO_DEEPSPARSE_MAPPINGS = {
    "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none" 
    "max_tokens": "max_length",
    "frequency_penalty": "repetition_penalty",
}

As a result of merging in [server] Update OpenAI Model Support #1300, all the models added during server start-up are available for inference using the /v1/chat/completions endpoint

Unit Testing

All tests have been updated and are passing

Testing and Examples - Existing Server

We can use the server through the same workflow that was previously used:

Sample Config:

num_cores: 2
num_workers: 2
endpoints:
  - task: text_generation
    model: zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none
  - task: question_answering
    model: zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/12layer_pruned80_quant-none-vnni

Launch a server:
deepsparse.server --config_file sample_config.yaml

Requests can be sent how they were sent previously:

import requests 

url = "http://localhost:5543/v2/models/question_answering-0/infer"
obj = {
    "question": "Who is Mark?", 
    "context": "Mark is batman."
}

response = requests.post(url, json=obj)
print(response.text)

url = "http://localhost:5543/v2/models/text_generation-0/infer"
obj = {
    "prompt": "Hey there!"
}

response = requests.post(url, json=obj)
print(response.text)

Testing and Examples - OpenAI

OpenAI can be used through the integration input, similar to how sagemaker was used before:
deepsparse.server --config_file sample_config.yaml --integration openai. There is also a dedicated workflow, through the deepsparse.openai sample_config.yaml command
Similar to before the refactor, multiple models can be hosted on multiple routes for the Deepsparse server and Sagemaker integration. However for OpenAI, only the task task_generation (and its aliases) are supported. Example server config is shown below.

num_cores: 2
num_workers: 2
endpoints:
  - task: text_generation
    model: zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none

The command and config above create the following routes:

Requests can be made through the requests library, curl commands or through the OpenAI API
All OpenAI requests must comply with the ChatCompletionRequest given by OpenAI, and all the parameters are mapped to our text_generation pipeline

Curl Commands:

curl http://localhost:5543/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
        "messages": "your favourite book?",
        "max_tokens": 30,
        "n": 2,
    }'

Output:

{"id":"cmpl-1bde341534bf4bd9a6f65133dd975516","object":"chat.completion","created":1696876569,"model":"zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none","choices":[{"index":0,"message":{"role":"assistant","content":"\n\nI’m a big fan of the Hitchhiker’s Guide to the Skies. I’m also a big fan"},"finish_reason":"length"}],"usage":{"prompt_tokens":2,"total_tokens":4,"completion_tokens":2}}

OpenAI API


import openai


openai.api_key = "EMPTY"
openai.api_base = "http://localhost:5543/v1"

# Completion API
stream = False
completion = openai.ChatCompletion.create(
    messages="how's your day going?",
    stream=stream,
    max_tokens=30,
    model="zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
)

print("Chat results:")
if stream:
    text = ""
    for c in completion:
        print(c)
else:
    print(completion)

Without streaming

Chat results:
{
  "id": "cmpl-d960c87b055c4c75ba01c448e2cfa49e",
  "object": "chat.completion",
  "created": 1696876680,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "\nI'm doing pretty well. I'm going to be going to the gym tomorrow. I'm going to be going to the gym tomorrow. I"
      },
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 2,
    "total_tokens": 4,
    "completion_tokens": 2
  }
}

With streaming (set stream flag to True)

{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": "assistant"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": "I"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": "'m"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " doing"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " pretty"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " well"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": "."
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " I"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": "'m"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " going"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " to"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " be"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " going"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " to"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " the"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " gym"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " tomorrow"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": "."
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " I"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": "'m"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " going"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " to"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " be"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " going"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " to"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " the"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " gym"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " tomorrow"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": "."
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": " I"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "cmpl-d8ad0b3c3f8d4b3790a7fcad1221af73",
  "object": "chat.completion.chunk",
  "created": 1696879356,
  "model": "zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": null,
        "content": ""
      },
      "finish_reason": "length"
    }
  ]
}

…for chat completion streaming and non streaming

bfineran

looks great - especially feeling good that we were able to migrate the existing tests

src/deepsparse/server/cli.py

src/deepsparse/server/deepsparse_server.py

src/deepsparse/server/openai_server.py

src/deepsparse/server/output.py

src/deepsparse/server/protocol.py

src/deepsparse/server/sagemaker.py

* update server * allow users to send requests with new models * use v1; move around baseroutes * add openai path * PR comments

…neration kwargs

rahul-tuli

Nice, clean code! one small nit is that there are places where Google docstring format is used, whereas we prefer rst in the rest of our codebase; but not a blocker from my side!

mgoin · 2023-10-10T13:16:45Z

@dsikka towards the point of OpenAI not having a do_sample argument, we should set do_sample=True when any of the sampling parameters are provided i.e. top_k, top_p. OpenAI's equivalent of do_sample=False is simply temperature=0

Satrat · 2023-10-10T13:48:04Z

src/deepsparse/server/cli.py

+            raise ValueError(
+                f"{integration} is not a supported integration. Must be "
+                "one of local, sagemkaer or openai."
+            )


nit: typo, also maybe worth defining a list of supported integrations as a global constant? Then we can just print out the list of supported integrations instead of having this as a hardcoded comment

Satrat · 2023-10-10T13:49:59Z

src/deepsparse/server/openai_server.py

+OPENAI_TO_DEEPSPARSE_MAPPINGS = {
+    "max_tokens": "max_length",
+    "frequency_penalty": "repetition_penalty",
+}


nit: is there a reason that we don't just use the openai namings?

Satrat · 2023-10-10T13:52:37Z

src/deepsparse/server/output.py

+    """The output data of one completion output of a request.
+
+    Args:
+        index: The index of the output in the request.
+        text: The generated output text.
+        token_ids: The token IDs of the generated output text.
+        cumulative_logprob: The cumulative log probability of the generated
+            output text.
+        logprobs: The log probabilities of the top probability words at each
+            position if the logprobs are requested.
+        finish_reason: The reason why the sequence is finished.


Another nit but this is a different docstring format than I've seen in other parts of our codebase. Would expect something like this:

""" blah blah blah :param param1: :param param2: """

Satrat · 2023-10-10T13:54:00Z

src/deepsparse/server/server.py

+# For deepsparse endpoints, we bind the `predict`` and `predict_from_files` functions to
+# each of the added routes. As we require access to the pipeline to run inference on
+# each request, instead of binding `predict`, we bind `partial(predict, pipeline ...)`
+# so that there is access to the pipelne when handling each request. However, fastapi
+# has trouble with validating the pipeline type. As a workaround, we can wrap each
+# pipelinne as a `ProxyPipeline` which can be resolved by fastapi.
+class ProxyPipeline:
+    def __init__(self, pipeline: Pipeline):
+        self.pipeline = pipeline


* update/clean-up server to match mlserver docs * update server tests * add back ping * [server] Refactor + OpenAI Chat Completion Support (#1288) * refactor server for different integrations; additional functionality for chat completion streaming and non streaming * further refactor server * add support such that openai can host multiple models * update all tests * fix output for n > 1 * add inline comment explaining ProxyPipeline * [server] Update OpenAI Model Support (#1300) * update server * allow users to send requests with new models * use v1; move around baseroutes * add openai path * PR comments * clean-up output classes to be dataclasses, add docstrings, cleanup generation kwargs * update readme, update route cleaning, update docstring * fix README for QA

* update/clean-up server to match mlserver docs * update server tests * add back ping * [server] Refactor + OpenAI Chat Completion Support (#1288) * refactor server for different integrations; additional functionality for chat completion streaming and non streaming * further refactor server * add support such that openai can host multiple models * update all tests * fix output for n > 1 * add inline comment explaining ProxyPipeline * [server] Update OpenAI Model Support (#1300) * update server * allow users to send requests with new models * use v1; move around baseroutes * add openai path * PR comments * clean-up output classes to be dataclasses, add docstrings, cleanup generation kwargs * update readme, update route cleaning, update docstring * fix README for QA * add openai doc * update docs * Update src/deepsparse/server/openai.md Co-authored-by: Domenic Barbuzzi <domenic@neuralmagic.com> --------- Co-authored-by: Domenic Barbuzzi <domenic@neuralmagic.com>

refactor server for different integrations; additional functionality …

bf662d3

…for chat completion streaming and non streaming

dsikka changed the base branch from main to match_mlserver September 28, 2023 22:06

dsikka added 2 commits September 29, 2023 13:37

further refactor server

8fb14cb

add support such that openai can host multiple models

af82ab5

dsikka marked this pull request as ready for review September 29, 2023 21:48

dsikka added 3 commits October 2, 2023 11:32

update all tests

4e40bda

fix output for n > 1

567dee0

add inline comment explaining ProxyPipeline

e8ee769

dsikka requested review from Satrat, mgoin, bfineran, dbogunowicz and rahul-tuli and removed request for Satrat October 2, 2023 19:04

bfineran reviewed Oct 6, 2023

View reviewed changes

dsikka added 3 commits October 9, 2023 11:25

[server] Update OpenAI Model Support (#1300)

45f199e

* update server * allow users to send requests with new models * use v1; move around baseroutes * add openai path * PR comments

clean-up output classes to be dataclasses, add docstrings, cleanup ge…

5238a9e

…neration kwargs

field yield for api streaming

b8e89e2

dsikka requested a review from bfineran October 9, 2023 19:55

bfineran approved these changes Oct 9, 2023

View reviewed changes

rahul-tuli approved these changes Oct 10, 2023

View reviewed changes

dsikka merged commit d99a82c into match_mlserver Oct 10, 2023

dsikka deleted the openai_support branch October 10, 2023 13:50

Satrat approved these changes Oct 10, 2023

View reviewed changes

dsikka mentioned this pull request Oct 10, 2023

[server] Update server routes to be compliant with MLServer #1237

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[server] Refactor + OpenAI Chat Completion Support #1288

[server] Refactor + OpenAI Chat Completion Support #1288

dsikka commented Sep 28, 2023 •

edited

Loading

bfineran left a comment

rahul-tuli left a comment

mgoin commented Oct 10, 2023

Satrat Oct 10, 2023

Satrat Oct 10, 2023

Satrat Oct 10, 2023

Satrat Oct 10, 2023

[server] Refactor + OpenAI Chat Completion Support #1288

[server] Refactor + OpenAI Chat Completion Support #1288

Conversation

dsikka commented Sep 28, 2023 • edited Loading

Summary

Couple of notes about the current integration:

Unit Testing

Testing and Examples - Existing Server

Testing and Examples - OpenAI

Curl Commands:

OpenAI API

bfineran left a comment

Choose a reason for hiding this comment

rahul-tuli left a comment

Choose a reason for hiding this comment

mgoin commented Oct 10, 2023

Satrat Oct 10, 2023

Choose a reason for hiding this comment

Satrat Oct 10, 2023

Choose a reason for hiding this comment

Satrat Oct 10, 2023

Choose a reason for hiding this comment

Satrat Oct 10, 2023

Choose a reason for hiding this comment

dsikka commented Sep 28, 2023 •

edited

Loading