Adding outlines #1

isamu-isozaki · 2024-04-23T02:51:43Z

This is a draft PR. Currently, the 3 main parts left to do to make this work is

Add support for kv cache in outlines or a fork of outlines(this was already handled by exllama2)
Support to the API to get what kind of generation we want(choice, json, pydantic class,regex etc)
Possibly add support for streaming to outlines as is currently done in the main repo of outlines

isamu-isozaki · 2024-04-23T02:58:59Z

I think I'll pull from outlines-dev/outlines#781 which will probably solve 1 and 3

edk208 · 2024-04-23T03:04:29Z

thanks, looking good so far... its nice that outlines already supports exl2

isamu-isozaki · 2024-04-24T20:33:46Z

@edk208 Some notes

I think I finished the main logic
For the logic of first doing preprocess and then generating tokens that are currently unfortunately not supported by outlines. I can make an outline fork that supports it but I think it can be a bit hacky. Does doing the preprocess first across all prompts offer better performance?
The current script doesn't support proper streaming but I can make it generate one token at a time-> stream using the PR mentioned above though this functionality is not in the main branch of outlines yet so is more experimental

So in summary I think these are all the changes that can work from the main branch of outlines so far. Happy to get feedback!

isamu-isozaki · 2024-04-27T00:25:23Z

I'll do the streaming idea tonight

edk208 · 2024-04-27T00:31:38Z

what do you mean by the "logic of first doing preprocess and then generating tokens"? do you mean the first model.forward with preprocess_only = True?

isamu-isozaki · 2024-04-27T01:43:58Z

@edk208 sry for the confusion and yes. To my understanding, the process is

We get the prompts -> tokenize+preprocess in exllama2
Generate 1 token for each of those prompts
For all end of sequence tokens stop
Put all in a while loop until all prompts and prompt ids are exhausted.

I think step 1 is technically not possible in outlines but steps 2 and 3 might be possible in the above pr. Let me try it tomorrow

edk208 · 2024-04-27T15:41:19Z

@isamu-isozaki yes that's correct. The preprocess runs the prompts through and sets up the KV cache, then you can round-robin through them and generate one token at a time. Interesting that outlines doesn't like step 1. I would imagine it would have to do that anyway. I can take a look too in the next few days.

isamu-isozaki · 2024-04-29T04:43:09Z

Hi! I think the main logic is done. For the test I used config.ini

[settings]
host = 127.0.0.1
port = 12345
upload_url = https://url/api/upload
path_url = https://url/folder/

[phi3b]
string = phi3b
repo = ..../Phi-3-mini-128k-instruct-exl2

with the model from here
and I started the server with

python llm_exl2_client_multi.py --port=5000 --use_outlines --gpu_split="5" --max_context=512 --repo_str=phi3b

Then on the client side, I did

from langchain.prompts import PromptTemplate
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    AIMessagePromptTemplate,
    HumanMessagePromptTemplate,
)
from langchain.schema import HumanMessage, SystemMessage, AIMessage
import json
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(temperature=1.0,
                openai_api_base="http://localhost:5000/v1", 
                openai_api_key="Test",
                streaming=True, 
                max_tokens=1024)
messages = [
    SystemMessage(
        content="You are a helpful assistant."
    ),
    HumanMessage(
        content="Who is more impressive? Bob or Fred?"
    )
]
choices = ["Bob", "Fred"]

for chunk in llm.stream(messages, extra_body={"stop_at":"done", "outlines_type": "choices", "choices": choices}):
    print(chunk.content, end="", flush=True)

which got me Bob. I can do more tests if you want but I think it's working. One main logic here is that for adding new parameters to the open ai API we use extra_body rather than function calling/tool calling since I couldn't think of an easy way to translate it.

Attempt adding outlines

de67102

isamu-isozaki marked this pull request as draft April 23, 2024 02:51

isamu-isozaki added 2 commits April 24, 2024 16:22

Setup basic outlines logic

18f96ba

Remove diff

7667151

isamu-isozaki marked this pull request as ready for review April 24, 2024 20:33

isamu-isozaki marked this pull request as draft April 24, 2024 20:34

Remove diff

7469470

isamu-isozaki added 2 commits April 27, 2024 00:51

Remove diffs

9161a6f

Don't modify previous funtion

edae5dc

isamu-isozaki added 2 commits April 27, 2024 12:28

Streaming logic done

df92442

Main logic done

1c99685

isamu-isozaki marked this pull request as ready for review April 29, 2024 04:35

isamu-isozaki added 2 commits April 29, 2024 00:36

Remove unnecessary diffs

36bec6c

Remove diffs

1513e5e

isamu-isozaki changed the title ~~WIP: Attempt adding outlines~~ Adding outlines Apr 29, 2024

Changed file name

0838020

edk208 merged commit 91114ea into blockentropy:main Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding outlines #1

Adding outlines #1

isamu-isozaki commented Apr 23, 2024 •

edited

Loading

isamu-isozaki commented Apr 23, 2024

edk208 commented Apr 23, 2024

isamu-isozaki commented Apr 24, 2024

isamu-isozaki commented Apr 27, 2024

edk208 commented Apr 27, 2024

isamu-isozaki commented Apr 27, 2024 •

edited

Loading

edk208 commented Apr 27, 2024

isamu-isozaki commented Apr 29, 2024

Adding outlines #1

Adding outlines #1

Conversation

isamu-isozaki commented Apr 23, 2024 • edited Loading

isamu-isozaki commented Apr 23, 2024

edk208 commented Apr 23, 2024

isamu-isozaki commented Apr 24, 2024

isamu-isozaki commented Apr 27, 2024

edk208 commented Apr 27, 2024

isamu-isozaki commented Apr 27, 2024 • edited Loading

edk208 commented Apr 27, 2024

isamu-isozaki commented Apr 29, 2024

isamu-isozaki commented Apr 23, 2024 •

edited

Loading

isamu-isozaki commented Apr 27, 2024 •

edited

Loading