Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate Streaming #10

Merged
merged 10 commits into from
Sep 21, 2023
Merged

Integrate Streaming #10

merged 10 commits into from
Sep 21, 2023

Conversation

ifrit98
Copy link
Contributor

@ifrit98 ifrit98 commented Sep 15, 2023

  • Adds a template for streaming servers
  • Adds documentation for it
  • Also fixes licensing headers

Copy link
Collaborator

@isabella618033 isabella618033 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Left a few questions.

await send({"type": "http.response.body", "body": (token + '\n').encode('utf-8'), "more_body": True})
bt.logging.trace(f"Streamed token: {token}")
# Sleep to show the streaming effect
await asyncio.sleep(1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is sleep here necessary for the final implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Crap, no that was for testing. Nice catch!

# Simulate model inference
input_ids = tokenizer(text, return_tensors="pt").input_ids.squeeze()
# Iterate over the decoded tokens and send them back to the client.
for token in model(input_ids):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we are sending token by token?

out of curiosity, how would it change the speed when we send 2 tokens each time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be better, I will test this and ammend

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

# Iterate over the decoded tokens and send them back to the client.
for token in model(input_ids):
# Send token back to the client
await send({"type": "http.response.body", "body": (token + '\n').encode('utf-8'), "more_body": True})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the end line necessary after each token?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The send() is required for every chunk, but would be up to the miners to implement their own buffering logic and determine an appropriate chunk size. Assume they'd buffer >1 token at a time and send in chunks.

I'll make an update to reflect this point. Thanks!

@ifrit98 ifrit98 merged commit e3de53c into main Sep 21, 2023
0 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants