Skip to content

Commit

Permalink
Merge branch 'outlines-dev:main' into token-cache
Browse files Browse the repository at this point in the history
  • Loading branch information
paul-grundmann committed Jun 14, 2024
2 parents b0c70bb + 18aaba1 commit cf9105c
Show file tree
Hide file tree
Showing 25 changed files with 1,117 additions and 9 deletions.
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,4 @@ RUN --mount=source=.git,target=.git,type=bind \
pip install --no-cache-dir .[serve]

# https://outlines-dev.github.io/outlines/reference/vllm/
ENTRYPOINT python3 -m outlines.serve.serve
ENTRYPOINT ["python3", "-m", "outlines.serve.serve"]
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ Outlines 〰 has new releases and features coming every week. Make sure to ⭐ s
## Why should I use structured generation?

* It doesn't add any overhead during inference (cost-free)
* It allows Open Source models to beat closed source models ([Mistral](https://x.com/dottxtai/status/1797692104023363765), [GPT-4](https://x.com/dottxtai/status/1798443290913853770))
* [It speeds up inference](http://blog.dottxt.co/coalescence.html)
* [It improves the performance of base models (GSM8K)](http://blog.dottxt.co/performance-gsm8k.html)
* [It improves the performance of finetuned models (CoNNL)](https://predibase.com/blog/lorax-outlines-better-json-extraction-with-structured-generation-and-lora)
Expand Down
118 changes: 118 additions & 0 deletions docs/cookbook/deploy-using-cerebrium.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
# Run Outlines using Cerebrium

[Cerebrium](https://www.cerebrium.ai/) is a serverless AI infrastructure platform that makes it easier for companies to build and deploy AI based applications. They offer Serverless GPU's with low cold start times with over 12 varieties of GPU chips that auto scale and you only pay for the compute you use.

In this guide we will show you how you can use Cerebrium to run programs written with Outlines on GPUs in the cloud.

# Setup Cerebrium

First, we install Cerebrium and login to get authenticated.

```bash
pip install cerebrium
cerebrium login
```

Then let us create our first project

```bash
cerebrium init outlines-project
```

## Setup Environment and Hardware

You set up your environment and hardware in the cerebrium.toml file that was created using the init function above.

```toml
[cerebrium.hardware]
cpu = 2
memory = 14.0
gpu = "AMPERE A10"
gpu_count = 1
provider = "aws"
region = "us-east-1"

[cerebrium.dependencies.pip]
outline = "==0.0.37"
transformers = "==4.38.2"
datasets = "==2.18.0"
accelerate = "==0.27.2"
```

## Setup inference

Running code in Cerebrium is like writing normal python with no special syntax. In a `main.py` file specify the following:

```python
import outlines


model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.2")

schema = """{
"title": "Character",
"type": "object",
"properties": {
"name": {
"title": "Name",
"maxLength": 10,
"type": "string"
},
"age": {
"title": "Age",
"type": "integer"
},
"armor": {"$ref": "#/definitions/Armor"},
"weapon": {"$ref": "#/definitions/Weapon"},
"strength": {
"title": "Strength",
"type": "integer"
}
},
"required": ["name", "age", "armor", "weapon", "strength"],
"definitions": {
"Armor": {
"title": "Armor",
"description": "An enumeration.",
"enum": ["leather", "chainmail", "plate"],
"type": "string"
},
"Weapon": {
"title": "Weapon",
"description": "An enumeration.",
"enum": ["sword", "axe", "mace", "spear", "bow", "crossbow"],
"type": "string"
}
}
}"""

generator = outlines.generate.json(model, schema)
```

On first deploy, it will download the model and store it on disk therefore for subsequent calls it will load the model from disk.

Every function in Cerebrium is callable through an API endpoint. Code at the top most layer (ie: not in a function) is instantiated only when the container is spun up the first time so for subsequent calls, it will simply run the code defined in the function you call.

To deploy an API that creates a new character when called with a prompt you can add the following code to `main.py`:

```python
def generate(
prompt: str = "Amiri, a 53 year old warrior woman with a sword and leather armor.",
):

character = generator(
f"<s>[INST]Give me a character description. Describe {prompt}.[/INST]"
)

return character
```


## Run on the cloud

```bash
cerebrium deploy
```

You will see your application deploy, install pip packages and download the model. Once completed it will output a CURL request you can use to call your endpoint. Just remember to end
the url with the function you would like to call - in this case /generate. You should see your response returned!
32 changes: 32 additions & 0 deletions docs/reference/models/mlxlm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# mlx-lm

Outlines provides an integration with [mlx-lm](https://github.com/ml-explore/mlx-examples/tree/main/llms), allowing models to be run quickly on Apple Silicon via the [mlx](https://ml-explore.github.io/mlx/build/html/index.html) library.

## Installation

In addition to `outlines`, you must install `mlx-lm` and `mlx` libraries. You must use a device which [supports Metal](https://support.apple.com/en-us/102894).

## Using `models.mlxlm`

```python
from outlines import models

model = models.mlxlm("mlx-community/mlx-community/Meta-Llama-3-8B-Instruct-8bit")
```

With the loaded model, you can generate text or perform structured generation, e.g.

```python3
from outlines import models, generate

model = models.mlxlm("mlx-community/Meta-Llama-3-8B-Instruct-8bit")

phone_number_pattern = "\\+?[1-9][0-9]{7,14}"
generator = generate.regex(model, phone_number_pattern)

model_output = generator("What's Jennys Number?\n")
print(model_output)
# '8675309'
```

For more examples, see the [cookbook](cookbook/index.md).
3 changes: 3 additions & 0 deletions docs/reference/models/tgi.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Text-generation-inference (TGI)

TGI uses Outlines to provide structured generation, see [their documentation](https://huggingface.co/docs/text-generation-inference/en/basic_tutorials/using_guidance).
4 changes: 4 additions & 0 deletions docs/stylesheets/extra.css
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,10 @@
background: #FFFFFF ! important
}

.language-toml {
background: #FFFFFF ! important
}

h1.title {
color: #FFFFFF;
margin: 0px 0px 5px;
Expand Down
2 changes: 1 addition & 1 deletion docs/welcome.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Outlines〰 is a Python library that allows you to use Large Language Model in a

## What models do you support?

We support [Openai](reference/models/openai.md), but the true power of Outlines〰 is unleashed with Open Source models available via the [transformers](reference/models/transformers.md), [llama.cpp](reference/models/transformers.md), [exllama2](reference/models/exllamav2.md) and [mamba_ssm](reference/models/mamba.md) libraries. If you want to build and maintain an integration with another library, [get in touch][discord].
We support [Openai](reference/models/openai.md), but the true power of Outlines〰 is unleashed with Open Source models available via the [transformers](reference/models/transformers.md), [llama.cpp](reference/models/llamacpp.md), [exllama2](reference/models/exllamav2.md) and [mamba_ssm](reference/models/mamba.md) libraries. If you want to build and maintain an integration with another library, [get in touch][discord].

## What are the main features?

Expand Down
26 changes: 26 additions & 0 deletions examples/cerebrium/cerebrium.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
[cerebrium.deployment]
name = "cerebrium"
python_version = "3.11"
cuda_version = "12"
include = "[./*, main.py, cerebrium.toml]"
exclude = "[.*]"
shell_commands = []

[cerebrium.hardware]
cpu = 2
memory = 14.0
gpu = "AMPERE A10"
gpu_count = 1
provider = "aws"
region = "us-east-1"

[cerebrium.scaling]
min_replicas = 0
max_replicas = 5
cooldown = 60

[cerebrium.dependencies.pip]
outline = "==0.0.37"
transformers = "==4.38.2"
datasets = "==2.18.0"
accelerate = "==0.27.2"
43 changes: 43 additions & 0 deletions examples/cerebrium/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
import outlines

model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.2")

schema = {
"title": "Character",
"type": "object",
"properties": {
"name": {"title": "Name", "maxLength": 10, "type": "string"},
"age": {"title": "Age", "type": "integer"},
"armor": {"$ref": "#/definitions/Armor"},
"weapon": {"$ref": "#/definitions/Weapon"},
"strength": {"title": "Strength", "type": "integer"},
},
"required": ["name", "age", "armor", "weapon", "strength"],
"definitions": {
"Armor": {
"title": "Armor",
"description": "An enumeration.",
"enum": ["leather", "chainmail", "plate"],
"type": "string",
},
"Weapon": {
"title": "Weapon",
"description": "An enumeration.",
"enum": ["sword", "axe", "mace", "spear", "bow", "crossbow"],
"type": "string",
},
},
}

generator = outlines.generate.json(model, schema)


def generate(
prompt: str = "Amiri, a 53 year old warrior woman with a sword and leather armor.",
):
character = generator(
f"<s>[INST]Give me a character description. Describe {prompt}.[/INST]"
)

print(character)
return character
3 changes: 3 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,7 @@ nav:
- Playing chess: cookbook/models_playing_chess.md
- Run on the cloud:
- BentoML: cookbook/deploy-using-bentoml.md
- Cerebrium: cookbook/deploy-using-cerebrium.md
- Modal: cookbook/deploy-using-modal.md
- Docs:
- reference/index.md
Expand All @@ -124,9 +125,11 @@ nav:
- vLLM: reference/models/vllm.md
- Llama.cpp: reference/models/llamacpp.md
- Transformers: reference/models/transformers.md
- MLX: reference/models/mlxlm.md
- ExllamaV2: reference/models/exllamav2.md
- Mamba: reference/models/mamba.md
- OpenAI: reference/models/openai.md
- TGI: reference/models/tgi.md

- API Reference:
- api/index.md
Expand Down
86 changes: 84 additions & 2 deletions outlines/fsm/json_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import json
import re
import warnings
from typing import Callable, Optional
from typing import Callable, Optional, Tuple

from jsonschema.protocols import Validator
from pydantic import create_model
Expand Down Expand Up @@ -96,6 +96,47 @@ def _get_num_items_pattern(min_items, max_items, whitespace_pattern):
return rf"{{{max(min_items - 1, 0)},{max_items - 1}}}"


def validate_quantifiers(
min_bound: Optional[str], max_bound: Optional[str], start_offset: int = 0
) -> Tuple[str, str]:
"""
Ensures that the bounds of a number are valid. Bounds are used as quantifiers in the regex.
Parameters
----------
min_bound
The minimum value that the number can take.
max_bound
The maximum value that the number can take.
start_offset
Number of elements that are already present in the regex but still need to be counted.
ex: if the regex is already "(-)?(0|[1-9][0-9])", we will always have at least 1 digit, so the start_offset is 1.
Returns
-------
min_bound
The minimum value that the number can take.
max_bound
The maximum value that the number can take.
Raises
------
ValueError
If the minimum bound is greater than the maximum bound.
TypeError or ValueError
If the minimum bound is not an integer or None.
or
If the maximum bound is not an integer or None.
"""
min_bound = "" if min_bound is None else str(int(min_bound) - start_offset)
max_bound = "" if max_bound is None else str(int(max_bound) - start_offset)
if min_bound and max_bound:
if int(max_bound) < int(min_bound):
raise ValueError("max bound must be greater than or equal to min bound")
return min_bound, max_bound


def to_regex(
resolver: Resolver, instance: dict, whitespace_pattern: Optional[str] = None
):
Expand Down Expand Up @@ -263,7 +304,7 @@ def to_regex(
if int(max_items) < int(min_items):
raise ValueError(
"maxLength must be greater than or equal to minLength"
)
) # FIXME this raises an error but is caught right away by the except (meant for int("") I assume)
except ValueError:
pass
return f'"{STRING_INNER}{{{min_items},{max_items}}}"'
Expand Down Expand Up @@ -291,9 +332,50 @@ def to_regex(
return type_to_regex["string"]

elif instance_type == "number":
bounds = {
"minDigitsInteger",
"maxDigitsInteger",
"minDigitsFraction",
"maxDigitsFraction",
"minDigitsExponent",
"maxDigitsExponent",
}
if bounds.intersection(set(instance.keys())):
min_digits_integer, max_digits_integer = validate_quantifiers(
instance.get("minDigitsInteger"),
instance.get("maxDigitsInteger"),
start_offset=1,
)
min_digits_fraction, max_digits_fraction = validate_quantifiers(
instance.get("minDigitsFraction"), instance.get("maxDigitsFraction")
)
min_digits_exponent, max_digits_exponent = validate_quantifiers(
instance.get("minDigitsExponent"), instance.get("maxDigitsExponent")
)
integers_quantifier = (
f"{{{min_digits_integer},{max_digits_integer}}}"
if min_digits_integer or max_digits_integer
else "*"
)
fraction_quantifier = (
f"{{{min_digits_fraction},{max_digits_fraction}}}"
if min_digits_fraction or max_digits_fraction
else "+"
)
exponent_quantifier = (
f"{{{min_digits_exponent},{max_digits_exponent}}}"
if min_digits_exponent or max_digits_exponent
else "+"
)
return rf"((-)?(0|[1-9][0-9]{integers_quantifier}))(\.[0-9]{fraction_quantifier})?([eE][+-][0-9]{exponent_quantifier})?"
return type_to_regex["number"]

elif instance_type == "integer":
if "minDigits" in instance or "maxDigits" in instance:
min_digits, max_digits = validate_quantifiers(
instance.get("minDigits"), instance.get("maxDigits"), start_offset=1
)
return rf"(-)?(0|[1-9][0-9]{{{min_digits},{max_digits}}})"
return type_to_regex["integer"]

elif instance_type == "array":
Expand Down
Loading

0 comments on commit cf9105c

Please sign in to comment.