Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic generation with outlines #14

Merged
merged 7 commits into from
Jul 4, 2024

Conversation

isamu-isozaki
Copy link
Contributor

@isamu-isozaki isamu-isozaki commented Jul 2, 2024

This is a pr to adapt outlines for exllamav2 dynamic generation. I think this will remove the need for this pr as exllamav2 should do this under the hood.

To run this, you need to install outlines from my branch using

pip uninstall outlines
pip install --no-cache-dir pip install git+https://github.com/isamu-isozaki/outlines.git@exllamav2_filter

This is currently a PR in outlines too here so in the future

pip install outlines 

might be enough.

I started server with

python llm_exl2_dynamic_gen.py --port 5000 --repo_str phi3b --max_context 2048 --total_context 4096 --not_paged

and tested code for

  1. Constrained generation for choices
  2. Constrained generation for json
  3. Constrained generation for regex
  4. Checking if stop_at keyword works
  5. Making sure server doesn't error on keyboard interrupt and they all seem to work.
    I'll add the code for these tests in the comments

@isamu-isozaki
Copy link
Contributor Author

Start client

from langchain.prompts import PromptTemplate
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    AIMessagePromptTemplate,
    HumanMessagePromptTemplate,
)
from langchain.schema import HumanMessage, SystemMessage, AIMessage
import json
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(temperature=1.0,
                openai_api_base="http://localhost:5000/v1", 
                openai_api_key="Test",
                streaming=True, 
                max_tokens=1024)

Test choices

messages = [
    SystemMessage(
        content="You are a helpful assistant."
    ),
    HumanMessage(
        content="Who is better bob or fred?"
    )
]
for chunk in llm.stream(messages, extra_body={"outlines_type": "choices", "choices": ["bob", "fred"]}):
    print(chunk.content, end="", flush=True)

Output:

bob

Test JSON

from enum import Enum
from pydantic import BaseModel, constr
import json

class Weapon(str, Enum):
    sword = "sword"
    axe = "axe"
    mace = "mace"
    spear = "spear"
    bow = "bow"
    crossbow = "crossbow"


class Armor(str, Enum):
    leather = "leather"
    chainmail = "chainmail"
    plate = "plate"


class Character(BaseModel):
    name: constr(max_length=10)
    age: int
    armor: Armor
    weapon: Weapon
    strength: int
messages = [
    SystemMessage(
        content="You are a helpful assistant."
    ),
    HumanMessage(
        content=f"Give me an interesting character description based on the following schema: {json.dumps(Character.schema())}"
    )
]
for chunk in llm.stream(messages, extra_body={"outlines_type": "json", "json": json.dumps(Character.schema())}):
    print(chunk.content, end="", flush=True)

Output

{ "name": "Eldric the" , "age": 37, "armor": "chainmail", "weapon": "sword", "strength": 87 }

Test Regex

messages = [
    SystemMessage(
        content="You are a helpful assistant."
    ),
    HumanMessage(
        content=f"Choose between bob and fred."
    )
]
for chunk in llm.stream(messages, extra_body={"outlines_type": "regex", "regex": "bob|fred"}):
    print(chunk.content, end="", flush=True)

Output

bob

Test stop_at keyword

messages = [
    SystemMessage(
        content="You are a helpful assistant."
    ),
    HumanMessage(
        content="Instruction: Always answer Questions in the form Question: What is 2+1?\nwith\nAnswer: 2+1=3\nQuestion: What is 21+1?\n"
    )
]
for chunk in llm.stream(messages, extra_body={"outlines_type": "text", "stop_at": "+"}):
    print(chunk.content, end="", flush=True)

For testing keyboard interrupt, I ran the below code, interrupted during generation, and then ran it again. And it seemed to work

messages = [
    SystemMessage(
        content="You are a helpful assistant."
    ),
    HumanMessage(
        content="What is your name?"
    )
]
for chunk in llm.stream(messages):
    print(chunk.content, end="", flush=True)

@isamu-isozaki
Copy link
Contributor Author

Let me know if there should be more tests!

isamu-isozaki and others added 6 commits July 3, 2024 15:43
…OS compatible with Llama3 and 'stop_at' string from outlines
…_enhance_stop_condition

Enhance ExllamaV2Sampler with Temperature Parameter and Update EOS Token for Llama3 Compatibility
@edk208 edk208 merged commit 9c4c893 into blockentropy:main Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants