Dynamic generation with outlines #14

isamu-isozaki · 2024-07-02T00:59:16Z

This is a pr to adapt outlines for exllamav2 dynamic generation. I think this will remove the need for this pr as exllamav2 should do this under the hood.

To run this, you need to install outlines from my branch using

pip uninstall outlines
pip install --no-cache-dir pip install git+https://github.com/isamu-isozaki/outlines.git@exllamav2_filter

This is currently a PR in outlines too here so in the future

pip install outlines

might be enough.

I started server with

python llm_exl2_dynamic_gen.py --port 5000 --repo_str phi3b --max_context 2048 --total_context 4096 --not_paged

and tested code for

Constrained generation for choices
Constrained generation for json
Constrained generation for regex
Checking if stop_at keyword works
Making sure server doesn't error on keyboard interrupt and they all seem to work.
I'll add the code for these tests in the comments

isamu-isozaki · 2024-07-02T01:02:25Z

Start client

from langchain.prompts import PromptTemplate
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    AIMessagePromptTemplate,
    HumanMessagePromptTemplate,
)
from langchain.schema import HumanMessage, SystemMessage, AIMessage
import json
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(temperature=1.0,
                openai_api_base="http://localhost:5000/v1", 
                openai_api_key="Test",
                streaming=True, 
                max_tokens=1024)

Test choices

messages = [
    SystemMessage(
        content="You are a helpful assistant."
    ),
    HumanMessage(
        content="Who is better bob or fred?"
    )
]
for chunk in llm.stream(messages, extra_body={"outlines_type": "choices", "choices": ["bob", "fred"]}):
    print(chunk.content, end="", flush=True)

Output:

bob

Test JSON

from enum import Enum
from pydantic import BaseModel, constr
import json

class Weapon(str, Enum):
    sword = "sword"
    axe = "axe"
    mace = "mace"
    spear = "spear"
    bow = "bow"
    crossbow = "crossbow"


class Armor(str, Enum):
    leather = "leather"
    chainmail = "chainmail"
    plate = "plate"


class Character(BaseModel):
    name: constr(max_length=10)
    age: int
    armor: Armor
    weapon: Weapon
    strength: int
messages = [
    SystemMessage(
        content="You are a helpful assistant."
    ),
    HumanMessage(
        content=f"Give me an interesting character description based on the following schema: {json.dumps(Character.schema())}"
    )
]
for chunk in llm.stream(messages, extra_body={"outlines_type": "json", "json": json.dumps(Character.schema())}):
    print(chunk.content, end="", flush=True)

Output

{ "name": "Eldric the" , "age": 37, "armor": "chainmail", "weapon": "sword", "strength": 87 }

Test Regex

messages = [
    SystemMessage(
        content="You are a helpful assistant."
    ),
    HumanMessage(
        content=f"Choose between bob and fred."
    )
]
for chunk in llm.stream(messages, extra_body={"outlines_type": "regex", "regex": "bob|fred"}):
    print(chunk.content, end="", flush=True)

Output

bob

Test stop_at keyword

messages = [
    SystemMessage(
        content="You are a helpful assistant."
    ),
    HumanMessage(
        content="Instruction: Always answer Questions in the form Question: What is 2+1?\nwith\nAnswer: 2+1=3\nQuestion: What is 21+1?\n"
    )
]
for chunk in llm.stream(messages, extra_body={"outlines_type": "text", "stop_at": "+"}):
    print(chunk.content, end="", flush=True)

For testing keyboard interrupt, I ran the below code, interrupted during generation, and then ran it again. And it seemed to work

messages = [
    SystemMessage(
        content="You are a helpful assistant."
    ),
    HumanMessage(
        content="What is your name?"
    )
]
for chunk in llm.stream(messages):
    print(chunk.content, end="", flush=True)

isamu-isozaki · 2024-07-02T01:02:33Z

Let me know if there should be more tests!

…OS compatible with Llama3 and 'stop_at' string from outlines

…_enhance_stop_condition Enhance ExllamaV2Sampler with Temperature Parameter and Update EOS Token for Llama3 Compatibility

Dynamic generation with outlines

6347a00

isamu-isozaki and others added 6 commits July 3, 2024 15:43

Change name

6062beb

Returned to original outlines file

53700a1

Fixed token id

340d4ed

Remove unnecessary

3855734

Pass temperature with ExLlamaV2Sampler.Settings() and also made the E…

c4b4c8f

…OS compatible with Llama3 and 'stop_at' string from outlines

Merge pull request #2 from ManilShrestha/feature/pass_temperature_and…

f302126

…_enhance_stop_condition Enhance ExllamaV2Sampler with Temperature Parameter and Update EOS Token for Llama3 Compatibility

edk208 merged commit 9c4c893 into blockentropy:main Jul 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic generation with outlines #14

Dynamic generation with outlines #14

isamu-isozaki commented Jul 2, 2024 •

edited

Loading

isamu-isozaki commented Jul 2, 2024

isamu-isozaki commented Jul 2, 2024

Dynamic generation with outlines #14

Dynamic generation with outlines #14

Conversation

isamu-isozaki commented Jul 2, 2024 • edited Loading

isamu-isozaki commented Jul 2, 2024

isamu-isozaki commented Jul 2, 2024

isamu-isozaki commented Jul 2, 2024 •

edited

Loading