Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[server] Update server routes to be compliant with MLServer #1237

Merged
merged 16 commits into from
Oct 11, 2023
Merged

Conversation

dsikka
Copy link
Contributor

@dsikka dsikka commented Sep 11, 2023

Summary:

Testing

  • Updated all the server tests
  • Tested locally using custom routes as well as routes built for the user given the model name

Sample Config:

num_cores: 2
num_workers: 2
endpoints:
  - task: question_answering
    model: zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/12layer_pruned80_quant-none-vnni
    route: some_route/pruned

This will now produce the following endpoints:
Screenshot 2023-10-10 at 11 59 39 AM

@dsikka dsikka marked this pull request as ready for review September 11, 2023 21:10
src/deepsparse/server/server.py Outdated Show resolved Hide resolved
src/deepsparse/server/server.py Outdated Show resolved Hide resolved
Copy link
Member

@bfineran bfineran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall - as discussed offline, agree with both @dsikka and @Satrat it's time we move route creation to a separate file / fleshed out system

src/deepsparse/server/server.py Show resolved Hide resolved
dbogunowicz
dbogunowicz previously approved these changes Sep 13, 2023
Copy link
Contributor

@dbogunowicz dbogunowicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, clean

bfineran
bfineran previously approved these changes Oct 6, 2023
Copy link
Member

@bfineran bfineran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - as mentioned before we need to sync w/ QA before landing

Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it would cause breaking changes to any application built using the old endpoint structure. I think if you could make a README or document showing how to transition current examples of usage, that would be helpful for other teams and users to have a summary of the changes.
For instance, how should we update this Digital Ocean getting started guide? https://marketplace.digitalocean.com/apps/deepsparse-inference-runtime

* refactor server for different integrations; additional functionality for chat completion streaming and non streaming

* further refactor server

* add support such that openai can host multiple models

* update all tests

* fix output for n > 1

* add inline comment explaining ProxyPipeline

* [server] Update OpenAI Model Support (#1300)

* update server

* allow users to send requests with new models

* use v1; move around baseroutes

* add openai path

* PR comments

* clean-up output classes to be dataclasses, add docstrings, cleanup generation kwargs
src/deepsparse/server/cli.py Outdated Show resolved Hide resolved
src/deepsparse/server/openai_server.py Show resolved Hide resolved
src/deepsparse/server/output.py Outdated Show resolved Hide resolved
bfineran
bfineran previously approved these changes Oct 10, 2023
Satrat
Satrat previously approved these changes Oct 10, 2023
@dsikka dsikka dismissed stale reviews from Satrat and bfineran via c550799 October 10, 2023 19:11
@dsikka dsikka requested a review from bfineran October 10, 2023 19:36
@dsikka dsikka merged commit 639e9e4 into main Oct 11, 2023
12 of 13 checks passed
@dsikka dsikka deleted the match_mlserver branch October 11, 2023 12:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants