-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding outlines #1
Conversation
I think I'll pull from outlines-dev/outlines#781 which will probably solve 1 and 3 |
thanks, looking good so far... its nice that outlines already supports exl2 |
@edk208 Some notes
So in summary I think these are all the changes that can work from the main branch of outlines so far. Happy to get feedback! |
I'll do the streaming idea tonight |
what do you mean by the "logic of first doing preprocess and then generating tokens"? do you mean the first model.forward with preprocess_only = True? |
@edk208 sry for the confusion and yes. To my understanding, the process is
I think step 1 is technically not possible in outlines but steps 2 and 3 might be possible in the above pr. Let me try it tomorrow |
@isamu-isozaki yes that's correct. The preprocess runs the prompts through and sets up the KV cache, then you can round-robin through them and generate one token at a time. Interesting that outlines doesn't like step 1. I would imagine it would have to do that anyway. I can take a look too in the next few days. |
Hi! I think the main logic is done. For the test I used config.ini
with the model from here
Then on the client side, I did from langchain.prompts import PromptTemplate
from langchain.prompts.chat import (
ChatPromptTemplate,
SystemMessagePromptTemplate,
AIMessagePromptTemplate,
HumanMessagePromptTemplate,
)
from langchain.schema import HumanMessage, SystemMessage, AIMessage
import json
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(temperature=1.0,
openai_api_base="http://localhost:5000/v1",
openai_api_key="Test",
streaming=True,
max_tokens=1024)
messages = [
SystemMessage(
content="You are a helpful assistant."
),
HumanMessage(
content="Who is more impressive? Bob or Fred?"
)
]
choices = ["Bob", "Fred"]
for chunk in llm.stream(messages, extra_body={"stop_at":"done", "outlines_type": "choices", "choices": choices}):
print(chunk.content, end="", flush=True) which got me Bob. I can do more tests if you want but I think it's working. One main logic here is that for adding new parameters to the open ai API we use extra_body rather than function calling/tool calling since I couldn't think of an easy way to translate it. |
This is a draft PR. Currently, the 3 main parts left to do to make this work is