Merge branch 'outlines-dev:main' into token-cache

outlines-dev · Jun 14, 2024 · cf9105c · cf9105c
2 parents b0c70bb + 18aaba1
commit cf9105c
Show file tree

Hide file tree

Showing 25 changed files with 1,117 additions and 9 deletions.
diff --git a/Dockerfile b/Dockerfile
@@ -14,4 +14,4 @@ RUN --mount=source=.git,target=.git,type=bind \
     pip install --no-cache-dir .[serve]
 
 # https://outlines-dev.github.io/outlines/reference/vllm/
-ENTRYPOINT python3 -m outlines.serve.serve
+ENTRYPOINT ["python3", "-m", "outlines.serve.serve"]
diff --git a/README.md b/README.md
@@ -45,6 +45,7 @@ Outlines 〰 has new releases and features coming every week. Make sure to ⭐ s
 ## Why should I use structured generation?
 
 * It doesn't add any overhead during inference (cost-free)
+* It allows Open Source models to beat closed source models ([Mistral](https://x.com/dottxtai/status/1797692104023363765), [GPT-4](https://x.com/dottxtai/status/1798443290913853770))
 * [It speeds up inference](http://blog.dottxt.co/coalescence.html)
 * [It improves the performance of base models (GSM8K)](http://blog.dottxt.co/performance-gsm8k.html)
 * [It improves the performance of finetuned models (CoNNL)](https://predibase.com/blog/lorax-outlines-better-json-extraction-with-structured-generation-and-lora)

diff --git a/docs/cookbook/deploy-using-cerebrium.md b/docs/cookbook/deploy-using-cerebrium.md
@@ -0,0 +1,118 @@
+# Run Outlines using Cerebrium
+
+[Cerebrium](https://www.cerebrium.ai/) is a serverless AI infrastructure platform that makes it easier for companies to build and deploy AI based applications. They offer Serverless GPU's with low cold start times with over 12 varieties of GPU chips that auto scale and you only pay for the compute you use.
+
+In this guide we will show you how you can use Cerebrium to run programs written with Outlines on GPUs in the cloud.
+
+# Setup Cerebrium
+
+First, we install Cerebrium and login to get authenticated.
+
+```bash
+pip install cerebrium
+cerebrium login
+```
+
+Then let us create our first project
+
+```bash
+cerebrium init outlines-project
+```
+
+## Setup Environment and Hardware
+
+You set up your environment and hardware in the cerebrium.toml file that was created using the init function above.
+
+```toml
+[cerebrium.hardware]
+cpu = 2
+memory = 14.0
+gpu = "AMPERE A10"
+gpu_count = 1
+provider = "aws"
+region = "us-east-1"
+
+[cerebrium.dependencies.pip]
+outline = "==0.0.37"
+transformers = "==4.38.2"
+datasets = "==2.18.0"
+accelerate = "==0.27.2"
+```
+
+## Setup inference
+
+Running code in Cerebrium is like writing normal python with no special syntax. In a `main.py` file specify the following:
+
+```python
+import outlines
+
+
+model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.2")
+
+schema = """{
+    "title": "Character",
+    "type": "object",
+    "properties": {
+        "name": {
+            "title": "Name",
+            "maxLength": 10,
+            "type": "string"
+        },
+        "age": {
+            "title": "Age",
+            "type": "integer"
+        },
+        "armor": {"$ref": "#/definitions/Armor"},
+        "weapon": {"$ref": "#/definitions/Weapon"},
+        "strength": {
+            "title": "Strength",
+            "type": "integer"
+        }
+    },
+    "required": ["name", "age", "armor", "weapon", "strength"],
+    "definitions": {
+        "Armor": {
+            "title": "Armor",
+            "description": "An enumeration.",
+            "enum": ["leather", "chainmail", "plate"],
+            "type": "string"
+        },
+        "Weapon": {
+            "title": "Weapon",
+            "description": "An enumeration.",
+            "enum": ["sword", "axe", "mace", "spear", "bow", "crossbow"],
+            "type": "string"
+        }
+    }
+}"""
+
+generator = outlines.generate.json(model, schema)
+```
+
+On first deploy, it will download the model and store it on disk therefore for subsequent calls it will load the model from disk.
+
+Every function in Cerebrium is callable through an API endpoint. Code at the top most layer (ie: not in a function) is instantiated only when the container is spun up the first time so for subsequent calls, it will simply run the code defined in the function you call.
+
+To deploy an API that creates a new character when called with a prompt you can add the following code to `main.py`:
+
+```python
+def generate(
+    prompt: str = "Amiri, a 53 year old warrior woman with a sword and leather armor.",
+):
+
+    character = generator(
+        f"<s>[INST]Give me a character description. Describe {prompt}.[/INST]"
+    )
+
+    return character
+```
+
+
+## Run on the cloud
+
+```bash
+cerebrium deploy
+```
+
+You will see your application deploy, install pip packages and download the model. Once completed it will output a CURL request you can use to call your endpoint. Just remember to end
+the url with the function you would like to call - in this case /generate. You should see your response returned!
diff --git a/docs/reference/models/mlxlm.md b/docs/reference/models/mlxlm.md
@@ -0,0 +1,32 @@
+# mlx-lm
+
+Outlines provides an integration with [mlx-lm](https://github.com/ml-explore/mlx-examples/tree/main/llms), allowing models to be run quickly on Apple Silicon via the [mlx](https://ml-explore.github.io/mlx/build/html/index.html) library.
+
+## Installation
+
+In addition to `outlines`, you must install `mlx-lm` and `mlx` libraries. You must use a device which [supports Metal](https://support.apple.com/en-us/102894).
+
+## Using `models.mlxlm`
+
+```python
+from outlines import models
+
+model = models.mlxlm("mlx-community/mlx-community/Meta-Llama-3-8B-Instruct-8bit")
+```
+
+With the loaded model, you can generate text or perform structured generation, e.g.
+
+```python3
+from outlines import models, generate
+
+model = models.mlxlm("mlx-community/Meta-Llama-3-8B-Instruct-8bit")
+
+phone_number_pattern = "\\+?[1-9][0-9]{7,14}"
+generator = generate.regex(model, phone_number_pattern)
+
+model_output = generator("What's Jennys Number?\n")
+print(model_output)
+# '8675309'
+```
+
+For more examples, see the [cookbook](cookbook/index.md).
diff --git a/docs/reference/models/tgi.md b/docs/reference/models/tgi.md
@@ -0,0 +1,3 @@
+# Text-generation-inference (TGI)
+
+TGI uses Outlines to provide structured generation, see [their documentation](https://huggingface.co/docs/text-generation-inference/en/basic_tutorials/using_guidance).
diff --git a/docs/stylesheets/extra.css b/docs/stylesheets/extra.css
@@ -78,6 +78,10 @@
   background: #FFFFFF ! important
 }
 
+.language-toml {
+  background: #FFFFFF ! important
+}
+
 h1.title {
   color: #FFFFFF;
   margin: 0px 0px 5px;

diff --git a/docs/welcome.md b/docs/welcome.md
@@ -6,7 +6,7 @@ Outlines〰 is a Python library that allows you to use Large Language Model in a
 
 ## What models do you support?
 
-We support [Openai](reference/models/openai.md), but the true power of Outlines〰 is unleashed with Open Source models available via the [transformers](reference/models/transformers.md), [llama.cpp](reference/models/transformers.md), [exllama2](reference/models/exllamav2.md) and [mamba_ssm](reference/models/mamba.md) libraries. If you want to build and maintain an integration with another library, [get in touch][discord].
+We support [Openai](reference/models/openai.md), but the true power of Outlines〰 is unleashed with Open Source models available via the [transformers](reference/models/transformers.md), [llama.cpp](reference/models/llamacpp.md), [exllama2](reference/models/exllamav2.md) and [mamba_ssm](reference/models/mamba.md) libraries. If you want to build and maintain an integration with another library, [get in touch][discord].
 
 ## What are the main features?
 

diff --git a/examples/cerebrium/cerebrium.toml b/examples/cerebrium/cerebrium.toml
@@ -0,0 +1,26 @@
+[cerebrium.deployment]
+name = "cerebrium"
+python_version = "3.11"
+cuda_version = "12"
+include = "[./*, main.py, cerebrium.toml]"
+exclude = "[.*]"
+shell_commands = []
+
+[cerebrium.hardware]
+cpu = 2
+memory = 14.0
+gpu = "AMPERE A10"
+gpu_count = 1
+provider = "aws"
+region = "us-east-1"
+
+[cerebrium.scaling]
+min_replicas = 0
+max_replicas = 5
+cooldown = 60
+
+[cerebrium.dependencies.pip]
+outline = "==0.0.37"
+transformers = "==4.38.2"
+datasets = "==2.18.0"
+accelerate = "==0.27.2"
diff --git a/examples/cerebrium/main.py b/examples/cerebrium/main.py
@@ -0,0 +1,43 @@
+import outlines
+
+model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.2")
+
+schema = {
+    "title": "Character",
+    "type": "object",
+    "properties": {
+        "name": {"title": "Name", "maxLength": 10, "type": "string"},
+        "age": {"title": "Age", "type": "integer"},
+        "armor": {"$ref": "#/definitions/Armor"},
+        "weapon": {"$ref": "#/definitions/Weapon"},
+        "strength": {"title": "Strength", "type": "integer"},
+    },
+    "required": ["name", "age", "armor", "weapon", "strength"],
+    "definitions": {
+        "Armor": {
+            "title": "Armor",
+            "description": "An enumeration.",
+            "enum": ["leather", "chainmail", "plate"],
+            "type": "string",
+        },
+        "Weapon": {
+            "title": "Weapon",
+            "description": "An enumeration.",
+            "enum": ["sword", "axe", "mace", "spear", "bow", "crossbow"],
+            "type": "string",
+        },
+    },
+}
+
+generator = outlines.generate.json(model, schema)
+
+
+def generate(
+    prompt: str = "Amiri, a 53 year old warrior woman with a sword and leather armor.",
+):
+    character = generator(
+        f"<s>[INST]Give me a character description. Describe {prompt}.[/INST]"
+    )
+
+    print(character)
+    return character
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -101,6 +101,7 @@ nav:
       - Playing chess: cookbook/models_playing_chess.md
       - Run on the cloud:
           - BentoML: cookbook/deploy-using-bentoml.md
+          - Cerebrium: cookbook/deploy-using-cerebrium.md
           - Modal: cookbook/deploy-using-modal.md
   - Docs:
     - reference/index.md
@@ -124,9 +125,11 @@ nav:
         - vLLM: reference/models/vllm.md
         - Llama.cpp: reference/models/llamacpp.md
         - Transformers: reference/models/transformers.md
+        - MLX: reference/models/mlxlm.md
         - ExllamaV2: reference/models/exllamav2.md
         - Mamba: reference/models/mamba.md
         - OpenAI: reference/models/openai.md
+        - TGI: reference/models/tgi.md
 
   - API Reference:
     - api/index.md

diff --git a/outlines/fsm/json_schema.py b/outlines/fsm/json_schema.py
@@ -2,7 +2,7 @@
 import json
 import re
 import warnings
-from typing import Callable, Optional
+from typing import Callable, Optional, Tuple
 
 from jsonschema.protocols import Validator
 from pydantic import create_model
@@ -96,6 +96,47 @@ def _get_num_items_pattern(min_items, max_items, whitespace_pattern):
         return rf"{{{max(min_items - 1, 0)},{max_items - 1}}}"
 
 
+def validate_quantifiers(
+    min_bound: Optional[str], max_bound: Optional[str], start_offset: int = 0
+) -> Tuple[str, str]:
+    """
+    Ensures that the bounds of a number are valid. Bounds are used as quantifiers in the regex.
+
+    Parameters
+    ----------
+    min_bound
+        The minimum value that the number can take.
+    max_bound
+        The maximum value that the number can take.
+    start_offset
+        Number of elements that are already present in the regex but still need to be counted.
+        ex: if the regex is already "(-)?(0|[1-9][0-9])", we will always have at least 1 digit, so the start_offset is 1.
+
+    Returns
+    -------
+    min_bound
+        The minimum value that the number can take.
+    max_bound
+        The maximum value that the number can take.
+
+    Raises
+    ------
+    ValueError
+        If the minimum bound is greater than the maximum bound.
+
+    TypeError or ValueError
+        If the minimum bound is not an integer or None.
+        or
+        If the maximum bound is not an integer or None.
+    """
+    min_bound = "" if min_bound is None else str(int(min_bound) - start_offset)
+    max_bound = "" if max_bound is None else str(int(max_bound) - start_offset)
+    if min_bound and max_bound:
+        if int(max_bound) < int(min_bound):
+            raise ValueError("max bound must be greater than or equal to min bound")
+    return min_bound, max_bound
+
+
 def to_regex(
     resolver: Resolver, instance: dict, whitespace_pattern: Optional[str] = None
 ):
@@ -263,7 +304,7 @@ def to_regex(
                     if int(max_items) < int(min_items):
                         raise ValueError(
                             "maxLength must be greater than or equal to minLength"
-                        )
+                        )  # FIXME this raises an error but is caught right away by the except (meant for int("") I assume)
                 except ValueError:
                     pass
                 return f'"{STRING_INNER}{{{min_items},{max_items}}}"'
@@ -291,9 +332,50 @@ def to_regex(
                 return type_to_regex["string"]
 
         elif instance_type == "number":
+            bounds = {
+                "minDigitsInteger",
+                "maxDigitsInteger",
+                "minDigitsFraction",
+                "maxDigitsFraction",
+                "minDigitsExponent",
+                "maxDigitsExponent",
+            }
+            if bounds.intersection(set(instance.keys())):
+                min_digits_integer, max_digits_integer = validate_quantifiers(
+                    instance.get("minDigitsInteger"),
+                    instance.get("maxDigitsInteger"),
+                    start_offset=1,
+                )
+                min_digits_fraction, max_digits_fraction = validate_quantifiers(
+                    instance.get("minDigitsFraction"), instance.get("maxDigitsFraction")
+                )
+                min_digits_exponent, max_digits_exponent = validate_quantifiers(
+                    instance.get("minDigitsExponent"), instance.get("maxDigitsExponent")
+                )
+                integers_quantifier = (
+                    f"{{{min_digits_integer},{max_digits_integer}}}"
+                    if min_digits_integer or max_digits_integer
+                    else "*"
+                )
+                fraction_quantifier = (
+                    f"{{{min_digits_fraction},{max_digits_fraction}}}"
+                    if min_digits_fraction or max_digits_fraction
+                    else "+"
+                )
+                exponent_quantifier = (
+                    f"{{{min_digits_exponent},{max_digits_exponent}}}"
+                    if min_digits_exponent or max_digits_exponent
+                    else "+"
+                )
+                return rf"((-)?(0|[1-9][0-9]{integers_quantifier}))(\.[0-9]{fraction_quantifier})?([eE][+-][0-9]{exponent_quantifier})?"
             return type_to_regex["number"]
 
         elif instance_type == "integer":
+            if "minDigits" in instance or "maxDigits" in instance:
+                min_digits, max_digits = validate_quantifiers(
+                    instance.get("minDigits"), instance.get("maxDigits"), start_offset=1
+                )
+                return rf"(-)?(0|[1-9][0-9]{{{min_digits},{max_digits}}})"
             return type_to_regex["integer"]
 
         elif instance_type == "array":