Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CFG to vllm serving #517

Merged
merged 1 commit into from
Jan 12, 2024
Merged

Add CFG to vllm serving #517

merged 1 commit into from
Jan 12, 2024

Conversation

mory91
Copy link
Contributor

@mory91 mory91 commented Jan 10, 2024

Hi,
This pull request adds support for CFG in vllm serving.

@@ -122,6 +127,8 @@ async def stream_results() -> AsyncGenerator[bytes, None]:
# Sets default for the model (`facebook/opt-125m`)
engine = AsyncLLMEngine.from_engine_args(engine_args)

_adapt_tokenizer(engine.engine.tokenizer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you calling this function here? The result is not used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tokenizer is changed inside the function anyways. I now assigned it to the tokenizer though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah makes sense. It's not needed however as vLLM handles tokenisation on its end during encoding/decoding.

Copy link
Contributor Author

@mory91 mory91 Jan 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's needed because in here

self.generation += self.tokenizer.decode([token_id])[0]
outlines expects tokenizer to return a list but vllm tokenizers return string.
Also I just realized that this change makes a breaking change to the library. If it's fine by the project directors its fine, if not we might need a change.
for example we can call _adapt_tokenizer inside __init__ functions of the logit processors

@rlouf
Copy link
Member

rlouf commented Jan 12, 2024

Thank you for your contributions! I added some documentation before merging.

@rlouf rlouf merged commit fde61a8 into outlines-dev:main Jan 12, 2024
4 checks passed
@lapp0 lapp0 mentioned this pull request Jan 13, 2024
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
structured generation Linked to structured generation vLLM Things involving vLLM support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add CFG guided generation to vLLM integration
2 participants