Add llama.cpp integration

outlines-dev · Jan 8, 2024 · 03b749a · 03b749a
1 parent 417a2ca
commit 03b749a
Show file tree

Hide file tree

Showing 12 changed files with 449 additions and 5 deletions.
diff --git a/.gitignore b/.gitignore
@@ -3,3 +3,4 @@ __pycache__
 *_version.py
 docs/build
 .coverage
+.idea/
diff --git a/docs/cookbook/index.md b/docs/cookbook/index.md
@@ -1,6 +1,6 @@
 # Examples
 
-- [Classification](classification): Classify customer requests.
+- [Classification](classification.md): Classify customer requests.
 - [Named Entity Extraction](extraction.md): Extract information from pizza orders.
 - [Dating Profile](dating_profiles.md): Build dating profiles from descriptions using prompt templating and JSON-guided generation.
 - [Chain Of Density](chain_of_density.md): Summarize documents using chain of density prompting and JSON-guided generation.

diff --git a/docs/reference/models/llamacpp.md b/docs/reference/models/llamacpp.md
@@ -0,0 +1,15 @@
+# Llama.cpp
+
+!!! Installation
+
+    You need to install the `llama-cpp-python` library to be able to use these models in Outlines.
+
+Outlines provides an integration with [Llama.cpp](https://github.com/ggerganov/llama.cpp) using the [llama-cpp-python library](https://github.com/abetlen/llama-cpp-python). Llamacpp allows to run quantized models on machines with limited compute.
+
+Assuming [Phi2's weights](https://huggingface.co/TheBloke/phi-2-GGUF) are in the current directory:
+
+```python
+from outlines import models, generate
+
+model = models.llamacpp("./phi-2.Q4_K_M.gguf", device="cpu")
+```
diff --git a/docs/reference/openai_text_generation.md → docs/reference/models/openai.md b/docs/reference/openai_text_generation.md → docs/reference/models/openai.md
@@ -1,5 +1,9 @@
 # Generate text with the OpenAI API
 
+!!! Installation
+
+    You need to install the `openai` and `tiktoken` libraries to be able to use the OpenAI API in Outlines.
+
 Outlines supports models available via the OpenAI Chat API, e.g. ChatGPT and GPT-4. The following models can be used with Outlines:
 
 ```python
@@ -12,6 +16,7 @@ print(type(model))
 # OpenAI
 ```
 
+
 It is possible to pass a system message to the model when initializing it:
 
 ```python

diff --git a/docs/reference/vllm.md b/docs/reference/vllm.md
@@ -49,9 +49,7 @@ curl http://127.0.0.1:8000/generate \
 
 Instead of `curl`, you can also use the [requests][requests]{:target="_blank"} library from another python program.
 
-Please consult the [vLLM documentation][vllm]{:target="_blank"} for details on additional request parameters.
-
-You can also [read the code](https://github.com/outlines-dev/outlines/blob/main/outlines/serve/serve.py) in case you need to customize the solution to your needs.
+Please consult the [vLLM documentation][vllm]{:target="_blank"} for details on additional request parameters. You can also [read the code](https://github.com/outlines-dev/outlines/blob/main/outlines/serve/serve.py) in case you need to customize the solution to your needs.
 
 [requests]: https://requests.readthedocs.io/en/latest/
 [vllm]: https://docs.vllm.ai/en/latest/index.html
diff --git a/examples/llamacpp_example.py b/examples/llamacpp_example.py
@@ -0,0 +1,46 @@
+from enum import Enum
+
+import torch
+from pydantic import BaseModel, constr
+
+import outlines
+
+
+class Weapon(str, Enum):
+    sword = "sword"
+    axe = "axe"
+    mace = "mace"
+    spear = "spear"
+    bow = "bow"
+    crossbow = "crossbow"
+
+
+class Armor(str, Enum):
+    leather = "leather"
+    chainmail = "chainmail"
+    plate = "plate"
+
+
+class Character(BaseModel):
+    name: constr(max_length=10)
+    age: int
+    armor: Armor
+    weapon: Weapon
+    strength: int
+
+
+if __name__ == "__main__":
+    # Download model from https://huggingface.co/TheBloke/phi-2-GGUF
+    model = outlines.models.llamacpp("./phi-2.Q3_K_M.gguf", device="cpu")
+
+    # Construct guided sequence generator
+    generator = outlines.generate.json(model, Character, max_tokens=512)
+
+    # Draw a sample
+    rng = torch.Generator(device="cpu")
+    rng.manual_seed(789005)
+
+    prompt = "Instruct: You are a leading role play gamer. You have seen thousands of different characters and their attributes.\nPlease return a JSON object with common attributes of an RPG character. Give me a character description\nOutput:"
+
+    sequence = generator(prompt, rng=rng)
+    print(sequence)
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -126,7 +126,8 @@ nav:
         - Prompt templating: reference/prompting.md
         - Outlines functions: reference/functions.md
     - Models:
-        - OpenAI: reference/openai_text_generation.md
+        - OpenAI: reference/models/openai.md
+        - Llama.cpp: reference/models/llamacpp.md
 
   - API Reference:
       - api/index.md

diff --git a/outlines/__init__.py b/outlines/__init__.py
@@ -11,6 +11,7 @@
     "clear_cache",
     "disable_cache",
     "get_cache",
+    "Function",
     "prompt",
     "vectorize",
 ]
diff --git a/outlines/models/__init__.py b/outlines/models/__init__.py
@@ -8,6 +8,7 @@
 from .awq import awq
 from .exllamav2 import exl2
 from .gptq import gptq
+from .llamacpp import LlamaCpp, llamacpp
 from .mamba import Mamba, mamba
 from .openai import OpenAI, openai
 from .transformers import Transformer, transformers