Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: use chunked inputs #1985

Merged
merged 1 commit into from
Jun 7, 2024
Merged

server: use chunked inputs #1985

merged 1 commit into from
Jun 7, 2024

Conversation

danieldk
Copy link
Member

What does this PR do?

The router will now send the input as chunks besides as a single string. This change modifies the server to process chunked input rather than strings. This also allows us to remove the image extraction code from the server.

Draft note: mostly checking whether all models pass, also needs to be rebased on #1981 once that PR is merged.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@danieldk danieldk force-pushed the feature/server-chunks branch 2 times, most recently from 544ddf5 to 64bea24 Compare May 31, 2024 12:00
@danieldk danieldk marked this pull request as ready for review June 3, 2024 15:59
from text_generation_server.pb import generate_pb2


def concat_text_chunks(chunks: Iterable[generate_pb2.InputChunk]) -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this method only to be future proof or is there a way today to have multiple text chunks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK we can currently only have multiple text chunks in a VLM models, so this was indeed only to future proof.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then mb we should take [0] and crash if len > 1 with an unreachable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, we do need to iterate over chunks, because we are sending image chunks unconditionally during warmup, even for text-only models:

The current approach seems more robust? What do you think about logging a warning when len(texts) > 1?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to:

  • Fail when there is more than one text chunk.
  • Fail when there is no text chunk.
  • Log at debug-level when there is a non-text chunk (only log because e.g. warmup sends an image chunk).

OlivierDehaene
OlivierDehaene previously approved these changes Jun 6, 2024
The router will now send the input as chunks besides as a single
string. This change modifies the server to process chunked input
rather than strings. This also allows us to remove the image
extraction code from the server.
@danieldk danieldk merged commit bf3c813 into main Jun 7, 2024
5 checks passed
@danieldk danieldk deleted the feature/server-chunks branch June 7, 2024 06:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants