Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cerebrium as deployment option in documentation #963

Merged
merged 1 commit into from
Jun 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
118 changes: 118 additions & 0 deletions docs/cookbook/deploy-using-cerebrium.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
# Run Outlines using Cerebrium

[Cerebrium](https://www.cerebrium.ai/) is a serverless AI infrastructure platform that makes it easier for companies to build and deploy AI based applications. They offer Serverless GPU's with low cold start times with over 12 varieties of GPU chips that auto scale and you only pay for the compute you use.

In this guide we will show you how you can use Cerebrium to run programs written with Outlines on GPUs in the cloud.

# Setup Cerebrium

First, we install Cerebrium and login to get authenticated.

```bash
pip install cerebrium
cerebrium login
```

Then let us create our first project

```bash
cerebrium init outlines-project
```

## Setup Environment and Hardware

You set up your environment and hardware in the cerebrium.toml file that was created using the init function above.

```toml
[cerebrium.hardware]
cpu = 2
memory = 14.0
gpu = "AMPERE A10"
gpu_count = 1
provider = "aws"
region = "us-east-1"

[cerebrium.dependencies.pip]
outline = "==0.0.37"
transformers = "==4.38.2"
datasets = "==2.18.0"
accelerate = "==0.27.2"
```

## Setup inference

Running code in Cerebrium is like writing normal python with no special syntax. In a `main.py` file specify the following:

```python
import outlines


model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.2")

schema = """{
"title": "Character",
"type": "object",
"properties": {
"name": {
"title": "Name",
"maxLength": 10,
"type": "string"
},
"age": {
"title": "Age",
"type": "integer"
},
"armor": {"$ref": "#/definitions/Armor"},
"weapon": {"$ref": "#/definitions/Weapon"},
"strength": {
"title": "Strength",
"type": "integer"
}
},
"required": ["name", "age", "armor", "weapon", "strength"],
"definitions": {
"Armor": {
"title": "Armor",
"description": "An enumeration.",
"enum": ["leather", "chainmail", "plate"],
"type": "string"
},
"Weapon": {
"title": "Weapon",
"description": "An enumeration.",
"enum": ["sword", "axe", "mace", "spear", "bow", "crossbow"],
"type": "string"
}
}
}"""

generator = outlines.generate.json(model, schema)
```

On first deploy, it will download the model and store it on disk therefore for subsequent calls it will load the model from disk.

Every function in Cerebrium is callable through an API endpoint. Code at the top most layer (ie: not in a function) is instantiated only when the container is spun up the first time so for subsequent calls, it will simply run the code defined in the function you call.

To deploy an API that creates a new character when called with a prompt you can add the following code to `main.py`:

```python
def generate(
prompt: str = "Amiri, a 53 year old warrior woman with a sword and leather armor.",
):

character = generator(
f"<s>[INST]Give me a character description. Describe {prompt}.[/INST]"
)

return character
```


## Run on the cloud

```bash
cerebrium deploy
```

You will see your application deploy, install pip packages and download the model. Once completed it will output a CURL request you can use to call your endpoint. Just remember to end
the url with the function you would like to call - in this case /generate. You should see your response returned!
4 changes: 4 additions & 0 deletions docs/stylesheets/extra.css
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,10 @@
background: #FFFFFF ! important
}

.language-toml {
background: #FFFFFF ! important
}

h1.title {
color: #FFFFFF;
margin: 0px 0px 5px;
Expand Down
26 changes: 26 additions & 0 deletions examples/cerebrium/cerebrium.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
[cerebrium.deployment]
name = "cerebrium"
python_version = "3.11"
cuda_version = "12"
include = "[./*, main.py, cerebrium.toml]"
exclude = "[.*]"
shell_commands = []

[cerebrium.hardware]
cpu = 2
memory = 14.0
gpu = "AMPERE A10"
gpu_count = 1
provider = "aws"
region = "us-east-1"

[cerebrium.scaling]
min_replicas = 0
max_replicas = 5
cooldown = 60

[cerebrium.dependencies.pip]
outline = "==0.0.37"
transformers = "==4.38.2"
datasets = "==2.18.0"
accelerate = "==0.27.2"
43 changes: 43 additions & 0 deletions examples/cerebrium/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
import outlines

model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.2")

schema = {
"title": "Character",
"type": "object",
"properties": {
"name": {"title": "Name", "maxLength": 10, "type": "string"},
"age": {"title": "Age", "type": "integer"},
"armor": {"$ref": "#/definitions/Armor"},
"weapon": {"$ref": "#/definitions/Weapon"},
"strength": {"title": "Strength", "type": "integer"},
},
"required": ["name", "age", "armor", "weapon", "strength"],
"definitions": {
"Armor": {
"title": "Armor",
"description": "An enumeration.",
"enum": ["leather", "chainmail", "plate"],
"type": "string",
},
"Weapon": {
"title": "Weapon",
"description": "An enumeration.",
"enum": ["sword", "axe", "mace", "spear", "bow", "crossbow"],
"type": "string",
},
},
}

generator = outlines.generate.json(model, schema)


def generate(
prompt: str = "Amiri, a 53 year old warrior woman with a sword and leather armor.",
):
character = generator(
f"<s>[INST]Give me a character description. Describe {prompt}.[/INST]"
)

print(character)
return character
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,7 @@ nav:
- Playing chess: cookbook/models_playing_chess.md
- Run on the cloud:
- BentoML: cookbook/deploy-using-bentoml.md
- Cerebrium: cookbook/deploy-using-cerebrium.md
- Modal: cookbook/deploy-using-modal.md
- Docs:
- reference/index.md
Expand Down
Loading