diff --git a/README.md b/README.md index 0337b0e6a..a30273972 100644 --- a/README.md +++ b/README.md @@ -6,76 +6,103 @@ Build _reliable_ workflows based on interactions with generative models. - +[Prompting](#prompting) • +[Controlled generation](#controlled-generation) • +[Agents](#agents-example) • +[Sampling](#sampling-uncertainty-simulation-based-inference) • +[Examples](#examples) + -## Prompt management +**Outlines** allows you to control and diagnose interactions with LLMs more effectively. Modern language models are powerful and versatile, but the way they interface with existing systems [can be very brittle](https://github.com/Significant-Gravitas/Auto-GPT/labels/invalid_json), their outputs [can be unreliable](https://arxiv.org/abs/2302.04023), and complex workflows (agents) can introduce a lot of error-prone code duplication. Outlines provides robust prompting primitives that separate the prompting from the execution logic and lead to simple implementations of few-shot generations, ReAct, meta-prompting, agents, etc. Outlines helps developers control text generation and produce predictable outputs that make the interaction with user code more robust. Its sampling-first approach allows one to diagnose issues with model-generated output more easily, and implement more robust generation methods such as [self-consistency](https://arxiv.org/abs/2203.11171) or [DiVeRSe](https://arxiv.org/abs/2206.02336). -Outlines makes it easier to write and manage prompts by encapsulating templates -inside "template functions". These functions make it possible to neatly separate -the prompt logic from the general program logic; they can be imported from other -modules and libraries. +**Outlines** is designed as a library that integrates well with the broader Python environment. Generation can be interleaved with control flow or custom function calls, prompts can be imported from other modules or libraries. -Template functions use the Jinja2 templating engine to help build complex -prompts (like few-shot examples) in a concise manner: -``` python -import outlines.text as text +## Features +- [x] Simple and powerful prompting primitives based on the [Jinja templating engine](https://jinja.palletsprojects.com/). +- [x] Interleave completions with loops, conditionals, and custom Python functions +- [x] Caching of generations +- [x] Integration with OpenAI and HuggingFace models +- [x] Controlled generation, including multiple choice, type constraints and dynamic stopping +- [x] Sampling of multiple sequences -@text.prompt -def few_shot_examples(question, examples): - """Something something - {% for example in examples %} - EXAMPLE: {{ example }} - {% endfor %} +## Installation - QUESTION: {{ question }} - Let's think step by step. +**Outlines** is available on PyPi: - """ +``` bash +pip install outlines ``` -Functions can also be _partially evaluated_ just like any function, which can be useful when building agents: +## Prompting + +Writing prompts by concatenating strings in pure Python quickly becomes +cumbersome: the prompt building logic gets entangled with the rest of the +program, and the structure of the rendered prompt is obfuscated.**Outlines** +makes it easier to write and manage prompts by encapsulating templates inside +"template functions". + +These functions make it possible to neatly separate the prompt logic from the +general program logic; they can be imported from other modules and libraries. + +Template functions require no superfluous abstraction, they use the Jinja2 +templating engine to help build complex prompts in a concise manner: ``` python -import functools as ft import outlines.text as text +import outlines.models as models +examples = [ + ("The food was digusting", "Negative"), + ("We had a fantastic night", "Positive"), + ("Recommended", "Positive"), + ("The waiter was rude", "Negative") +] + @text.prompt -def my_agent(name, goals): - """Your name is {{ name }}. +def labelling(to_label, examples): + """You are a sentiment-labelling assistant. - GOALS: - {% for goal in goals %} - {{ loop.counter }}. {{ goal }} + {% for example in examples %} + {{ example[0] }} // {{ example[1] }} {% endfor %} + {{ to_label }} // """ - -jarvis = ft.partial(my_agent, "JARVIS") +model = models.text_completion.openai("text-davinci-003") +prompt = labelling("Just awesome", examples) +answer = complete(prompt) ``` -The template contained in template functions remains accessible: +## Chaining with loops and conditionals ([example](https://github.com/normal-computing/outlines/blob/readme/examples/react.py)) + +**Outlines** comes with very few abstractions, and is designed to blend into existing code and integrate with the rest of the ecosystem. ``` python -import outlines.text as text +reviews = ["Just awesome", "Avoid", "Will come back"] +def send_notification(review): + """This function sends a notification with the review's content.""" + ... -@text.prompt -def prompt(): - "I am accessible" +for review in reviews: + prompt = labelling(review, examples) + answer = model(prompt) + if answer == "Positive": + send_notification(review) +``` +## Agents ([example](https://github.com/normal-computing/outlines/blob/readme/examples/babyagi.py)) -prompt.template -# I am accessible -``` +**Outlines** makes building agents like [AutoGPT](https://github.com/Significant-Gravitas/Auto-GPT), [BabyAGI](https://github.com/yoheinakajima/babyagi), [ViperGPT](https://viper.cs.columbia.edu/) or [Transformers Agent](https://huggingface.co/docs/transformers/transformers_agents) easier by removing boilerplate prompting code. ### Tools -Prior work has shown that we can teach language models to call external functions to get additional informations or perform tasks, by encoding the functions' description in the prompt. To avoid duplicating information between the function definition and the description passed to the prompt we define custom Jinja filters that can extract the function's name, description, signature and source: +We can teach language models to call external functions to get additional informations or perform tasks, by encoding the functions' description in the prompt. To avoid duplicating information between the function definition and the description passed to the prompt, we define custom Jinja filters that can extract the function's name, description, signature and source: ``` python @@ -94,7 +121,7 @@ def wikipedia_search(query: str): @text.prompt -def my_commands(tools: List[Callable]): +def agent(tools: List[Callable]): """AVAILABLE COMMANDS: {% for tool in tools %} @@ -141,88 +168,78 @@ joke_ppt(Joke) # } ``` +## Controlled generation -## Natural language functions +The first step towards reliability of systems that include large language models is to ensure that there is a well-defined interface between their output and user-defined code. **Outlines** provides ways to control the generation of language models to make their output more predictable. -Large language models can be prompted so their output can be parsed into a data structure that can be manipulated by programming languages. The combination prompt + model call + output parser can thus be thought as a "natural language" function. +You can stop the generation after a given sequence has been found: ``` python -import json -import outlines.text as text -import outlines.models as models - +answer = model("Tell me a one-sentence joke.", stop_at=["."]) +``` -@text.prompt -def prime_numbers(n: int): - """Return a list that contains all prime numbers between 1 and {{ n }}. +You can reduce the completion to a choice between multiple possibilities: - The output must be parsable as a Python list. - """ +``` python +prompt = labelling("Just awesome", examples) +answer = model(prompt, is_in=["Positive", "Negative"]) +``` -def parse(result): - return json.loads(result) +You can require the generated sequence to be an int or a float: +``` python +import outlines.models as models -get_prime_numbers = text.function( - models.text_completion.openai("gpt-3.5-turbo"), - prime_numbers, - parse -) +model = models.text_completion.hf("sshleifer/tiny-gpt2") +answer = model("2 + 2 = ", type="int") +print(answer) +# 4 -get_prime_numbers(10) -# [2, 3, 5, 7] +model = models.text_completion.hf("sshleifer/tiny-gpt2") +answer = model("1.7 + 3.2 = ", type="float") +print(answer) +# 4.9 ``` -For more complex outputs one can pass a Pydantic model to `text.function`, which will be used to parse the output: -``` python -from pydantic import BaseModel -import outlines.text as text +## Sampling ([uncertainty](https://github.com/normal-computing/outlines/blob/readme/examples/sampling.ipynb), [simulation-based inference](https://github.com/normal-computing/outlines/blob/readme/examples/simulation_based_inference.ipynb)) +Outlines is strictly sampling based, and focused on using methods such as [self-consistency](https://arxiv.org/abs/2203.11171), [adaptive consistency](https://arxiv.org/abs/2305.11860), [DiVeRSe](https://arxiv.org/abs/2206.02336), [Tree of thoughts](https://arxiv.org/abs/2305.10601), [lattice sampling](https://arxiv.org/abs/2112.07660), etc. Several samples can be obtained using the `num_samples` keyword argument: -class Joke(BaseModel): - joke: str - explanation: str +``` python +import outlines.models as models -@text.prompt -def joke_ppt(response_model): - """Tell a joke and explain why the joke is funny. +model = models.text_completion.hf("sshleifer/tiny-gpt2") +answer = model("2 + 2 = ", num_samples=5) +print(answer) +# [4, 5, 4, 4, 4] +``` - RESPONSE FORMAT: - {{ response_model | schema }} - """ +The focus on sampling allows us to explore different ideas, such as [using the diversity of answers to evaluate the model's uncertainty](https://github.com/normal-computing/outlines/blob/readme/examples/sampling.ipynb), or [simulation-based inference to optimize the prompt](https://github.com/normal-computing/outlines/blob/readme/examples/simulation_based_inference.ipynb). -tell_a_joke = text.function( - models.text_completion.openai("gpt-3.5-turbo"), - joke_ppt, - Joke -) +## Contributing -tell_a_joke(Joke) -# [2, 3, 5, 7] -``` +### What contributions? -# Controlled generation +We curently only accept bug fixes and documentation contributions. If you have a feature request, please start a new [discussions](https://github.com/normal-computing/outlines/discussions). The issue tracker is only intended for actionable items. -Outlines offers mechanisms to specify high-level constraints on the text generations: +### How to contribute? -- `stop_at` allows to stop the generation once a particular word, sequence of symbol had been generated; -- `is_in` allows to constrain the model to generate an answer chosen among a set of possible answers; -- `type` allows to constrain the model's output to either `"int"`s or `"float"`s; +Run `pip install -e .[test]` or `conda env create -f environment.yml`. To build the documentation you will also need to run `pip install -r requirements-doc.txt`. -Coming: +Before pushing your code to repository please run `pre-commit run --all-files` and `pytest` to make sure that the code is formatted correctly and that the tests pass. -- Ability to constrain the output to a JSON with a given structure; -- Ability to constrain the output to a List; -- Ability to constrain the output to be Python code; +Do not hesitate to open a draft PR before your contribution is ready, especially if you have questions and/or need feedback. -# Examples +## Examples - [Pick the odd one out](https://github.com/normal-computing/outlines/blob/main/examples/pick_odd_one_out.py) - [Meta prompting](https://github.com/normal-computing/outlines/blob/main/examples/meta_prompting.py) - [ReAct](https://github.com/normal-computing/outlines/blob/main/examples/meta_prompting.py) - [Generate code to solve math problems](https://github.com/normal-computing/outlines/blob/main/examples/dust/math-generate-code.py) - [BabyAGI](https://github.com/normal-computing/outlines/blob/main/examples/babyagi.py) +- [Uncertainty](https://github.com/normal-computing/outlines/blob/readme/examples/sampling.ipynb) +- [Simulation-based inference](https://github.com/normal-computing/outlines/blob/readme/examples/simulation_based_inference.ipynb) diff --git a/examples/sampling.ipynb b/examples/sampling.ipynb new file mode 100644 index 000000000..0743bb9d2 --- /dev/null +++ b/examples/sampling.ipynb @@ -0,0 +1,401 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "id": "62129e1a-e9de-454e-a714-35ccbcf0b518", + "metadata": {}, + "outputs": [], + "source": [ + "import functools as ft\n", + "import re\n", + "\n", + "import numpy as np\n", + "import matplotlib.pylab as plt\n", + "\n", + "import outlines.models as models\n", + "import outlines.text as text" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "b20aafe8-b7a3-4df4-878f-b48b74e131df", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "env: OPENAI_API_KEY=# you key here\n" + ] + } + ], + "source": [ + "%env OPENAI_API_KEY= # you key here" + ] + }, + { + "cell_type": "markdown", + "id": "2a3514d6-d5d7-46e9-9b69-1251d337e094", + "metadata": {}, + "source": [ + "In this example we will look at completion results for questions from the GSM8K dataset, using few-shots prompts with 5 examples. We first use `outlines.text.prompt` to build the few-shot prompt. `outlines.text.prompt` is a decorator around \"prompt functions\" which contain the prompt template in its docstring. Outlines uses the Jinja2 templating engine to render the prompt when the function is called with the variables' values; it thus allows you to build complex prompts very easily." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "ffe8bb11-6b51-4fe7-bfb3-c62556a60db8", + "metadata": {}, + "outputs": [], + "source": [ + "examples = [\n", + " {\n", + " \"question\": \"There are 15 trees in the grove. Grove workers will plant trees in the grove today. After they are done, there will be 21 trees. How many trees did the grove workers plant today?\",\n", + " \"answer\": \"We start with 15 trees. Later we have 21 trees. The difference must be the number of trees they planted. So, they must have planted 21 - 15 = 6 trees. The answer is 6.\",\n", + " },\n", + " {\n", + " \"question\": \"If there are 3 cars in the parking lot and 2 more cars arrive, how many cars are in the parking lot?\",\n", + " \"answer\": \"There are 3 cars in the parking lot already. 2 more arrive. Now there are 3 + 2 = 5 cars. The answer is 5.\",\n", + " },\n", + " {\n", + " \"question\": \"Leah had 32 chocolates and her sister had 42. If they ate 35, how many pieces do they have left in total?\",\n", + " \"answer\": \"Leah had 32 chocolates and Leah’s sister had 42. That means there were originally 32 + 42 = 74 chocolates. 35 have been eaten. So in total they still have 74 - 35 = 39 chocolates. The answer is 39.\",\n", + " },\n", + " {\n", + " \"question\": \"Jason had 20 lollipops. He gave Denny some lollipops. Now Jason has 12 lollipops. How many lollipops did Jason give to Denny?\",\n", + " \"answer\": \"Jason had 20 lollipops. Since he only has 12 now, he must have given the rest to Denny. The number of lollipops he has given to Denny must have been 20 - 12 = 8 lollipops. The answer is 8.\",\n", + " },\n", + " {\n", + " \"question\": \"Shawn has five toys. For Christmas, he got two toys each from his mom and dad. How many toys does he have now?\",\n", + " \"answer\": \"He has 5 toys. He got 2 from mom, so after that he has 5 + 2 = 7 toys. Then he got 2 more from dad, so in total he has 7 + 2 = 9 toys. The answer is 9.\",\n", + " },\n", + " {\n", + " \"question\": \"There were nine computers in the server room. Five more computers were installed each day, from monday to thursday. How many computers are now in the server room?\",\n", + " \"answer\": \"There are 4 days from monday to thursday. 5 computers were added each day. That means in total 4 * 5 = 20 computers were added. There were 9 computers in the beginning, so now there are 9 + 20 = 29 computers. The answer is 29.\",\n", + " },\n", + " {\n", + " \"question\": \"Michael had 58 golf balls. On tuesday, he lost 23 golf balls. On wednesday, he lost 2 more. How many golf balls did he have at the end of wednesday?\",\n", + " \"answer\": \"Michael initially had 58 balls. He lost 23 on Tuesday, so after that he has 58 - 23 = 35 balls. On Wednesday he lost 2 more so now he has 35 - 2 = 33 balls. The answer is 33.\",\n", + " },\n", + " {\n", + " \"question\": \"Olivia has $23. She bought five bagels for $3 each. How much money does she have left?\",\n", + " \"answer\": \"She bought 5 bagels for $3 each. This means she spent 5\",\n", + " },\n", + "]\n", + "\n", + "@text.prompt\n", + "def few_shot_prompt(question, examples):\n", + " \"\"\"\n", + " {% for example in examples %}\n", + " Q: {{ example.question }}\n", + " A: {{ example.answer }}\n", + " {% endfor %}\n", + " Q: {{ question }}\n", + " A:\n", + " \"\"\"\n", + "\n", + "# Prompt functions can be partially evaluated like any other function\n", + "gsm8k_prompt = ft.partial(few_shot_prompt, examples=examples)" + ] + }, + { + "cell_type": "markdown", + "id": "1eae0ec8-89f0-43fc-b055-6fcd64cbc03b", + "metadata": {}, + "source": [ + "## When `text-davinci-003` is uncertain" + ] + }, + { + "cell_type": "markdown", + "id": "a273ed78-e813-467e-85f3-16d7f283ba87", + "metadata": {}, + "source": [ + "Let us now sample 20 completions with the `text-davinci-003` model. Outlines is sampling first, and allows to draw several samples with both OpenAI and `transformers` models easily:" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "beff960d-6833-4f24-af09-5b65886a9549", + "metadata": {}, + "outputs": [], + "source": [ + "model = models.text_completion.openai(\"text-davinci-003\", max_tokens=128)\n", + "\n", + "question = \"When I was 6 my sister was half my age. Now I’m 70 how old is my sister?\"\n", + "prompt = gsm8k_prompt(question)\n", + "answers = model(prompt, samples=20)" + ] + }, + { + "cell_type": "markdown", + "id": "1a895b6d-d4d4-40f9-9156-24ba7e21cc08", + "metadata": {}, + "source": [ + "The correct answer to this question is 35. Let us now count the different answers, and take a look at their distribution. Let us first define a few utility functions:" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "f1c83d1f-a478-4509-890e-b84a2e0d8846", + "metadata": {}, + "outputs": [], + "source": [ + "def count_digits(answers):\n", + " digits = []\n", + " for answer in answers:\n", + " try:\n", + " match = re.findall(r\"\\d+\", answer)[-1]\n", + " if match is not None:\n", + " digit = int(match)\n", + " digits.append(digit)\n", + " except AttributeError:\n", + " print(f\"Could not parse the completion: '{answer}'\")\n", + " \n", + " unique_digits, counts = np.unique(digits, return_counts=True)\n", + " return {d: c for d, c in zip(unique_digits, counts)}\n", + "\n", + "def plot_counts(counts):\n", + " fig = plt.figure(figsize=(12,8))\n", + " ax = fig.add_subplot(111)\n", + " \n", + " bar = ax.bar(counts.keys(), counts.values())\n", + " ax.spines[[\"right\", \"top\", \"left\"]].set_visible(False)\n", + " ax.get_yaxis().set_visible(False)\n", + " ax.get_yaxis().set_visible(False)\n", + " \n", + " for rect in bar:\n", + " height = rect.get_height()\n", + " plt.text(rect.get_x() + rect.get_width() / 2.0, height, f'{height:.0f}', ha='center', va='bottom', fontsize=20)\n", + " \n", + " ax.set_xticks(list(counts.keys()))\n", + " ax.set_xlabel(\"Answer\")\n", + "\n", + "def entropy(counts):\n", + " counts = np.array(list(counts.values()))\n", + " probs = counts / np.sum(counts)\n", + " log_probs = np.log(probs)\n", + " return - np.sum(probs * log_probs)" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "88668e09-bcd6-4a6a-83a5-838189b910eb", + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAqsAAAHgCAYAAACCbCTDAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAAsTAAALEwEAmpwYAAAVwklEQVR4nO3df7DldX3f8dcbFtmVLb9cDDgLrB0DjNWUlWQzWBaDtEJg0BBiJk5VsKLFKRSsU7u2M8wKZouDjsg4Y0chBn9MTfgh3QkapQiCHSJWFigBAlMgggUkJlVJDXXh0z/2ILuwF9LZe+95372Px8yZvff7PeznvcB893m/53vOt8YYAQCAjnaZ9gAAADATsQoAQFtiFQCAtsQqAABtiVUAANoSqwAAtLXkRfb7XCsAAOZDbW+jM6sAALQlVgEAaEusAgvGddddl5NPPjn7779/dt9997ziFa/Icccdl69+9avTHg3YiTn2TNeLXbMK0MIHP/jBXHjhhVm5cmXe/OY3Z8WKFXn88cfzve99LzfccENOOOGEaY8I7IQce6avxnjB91B5gxUwdZ/97Gfz3ve+N6eeemo+85nP5CUveck2+3/+859nt912m9J0wM7KsWfebfcNVmIVaO3JJ5/MgQcemGXLluW+++573l8WAHPBsWcqthurLgMAWrv22mvz+OOP55xzzskuu+ySa665JnfeeWeWLl2aNWvW5Mgjj5z2iMBOyLGnD7EKtPbd7343SbJ06dKsXr06d9555zb7jz766FxxxRXZb7/9pjEesJNy7OnDpwEArf3whz9Mklx44YWpqtx000356U9/mjvuuCNvetObcuONN+atb33rlKcEdjaOPX2IVaC1p59+OkmyZMmSbNy4MUcddVSWL1+e1772tfnKV76SlStX5lvf+lZuvvnmKU8K7Ewce/oQq0Bre++9d5Jk9erVWbVq1Tb7XvrSl+a4445Lktxyyy3zPBmwM3Ps6UOsAq0deuihSZ79i+O59tlnnyTJz372s/kaCVgEHHv6EKtAa8cee2yqKnfdddcvXpbb2jNvenjlK18536MBOzHHnj7EKtDawQcfnJNOOinf//7388lPfnKbfd/4xjfy9a9/PXvvvXeOP/74KU0I7Iwce/pwUwCgvYcffjivf/3r89BDD+XYY4/N6tWr88ADD+Tqq69OVeXLX/5yTjnllGmPCexkHHvmnTtYAQvX448/nvPOOy8bN27MI488kj333DNr167Nhz70oaxZs2ba4wE7KceeeSVWAQBoa7ux6ppVAADaEqsAALQlVgEAaGvJtAcAeDGr1l0z474HLzhxHicBFhPHnh6cWQUAoC2xCgBAW2IVAIC2xCoAAG2JVQAA2hKrAAC0JVYBAGhLrAIA0JZYBQCgLbEKAEBbYhUAgLbEKgAAbYlVAADaEqsAALQlVgEAaEusAgDQllgFAKAtsQoAQFtiFQCAtsQqAABtiVUAANoSqwAAtCVWAQBoS6wCANCWWAUAoC2xCgBAW2IVAIC2xCoAAG2JVQAA2hKrAAC0JVYBAGhLrAIA0JZYBQCgLbEKAEBbYhUAgLbEKgAAbYlVAADaEqsAALQlVgEAaEusAgDQllgFAKAtsQoAQFtiFQCAtsQqAABtiVUAANoSqwAAtCVWAQBoS6wCANCWWAUAoC2xCgBAW2IVAIC2xCoAAG2JVQAA2hKrAAC0JVYBAGhLrAIA0JZYBQCgLbEKAEBbYhUAgLbEKgAAbYlVAADaEqsAALQlVgEAaEusAgDQllgFAKAtsQoAQFtiFQCAtsQqAABtiVUAANoSqwAAtCVWAQBoS6wCANCWWAUAoC2xCgBAW2IVAIC2xCoAAG2JVQAA2hKrAAC0JVYBAGhLrAIA0JZYBQCgLbEKAEBbYhUAgLbEKgAAbYlVAADaEqsAALQlVgEAaEusAgDQllgFAKAtsQoAQFtiFQCAtsQqAABtiVUAANoSqwAAtCVWAQBoS6wCANCWWAUAoC2xCgBAW2IVAIC2xCoAAG2JVQAA2hKrAAC0JVYBAGhLrAIA0JZYBQCgLbEKAEBbYhUAgLbEKgAAbYlVAADaEqsAALQlVgEAaEusAgDQllgFAKAtsQoAQFtiFQCAtsQqAABtiVUAANoSqwAAtCVWAQBoS6wCAAvaF7/4xVRVqiqXXHLJtMdhlolVAGDBeuihh3LmmWdm+fLl0x6FOSJWAYAFaYyRd73rXXnZy16WM844Y9rjMEfEKgCwIF188cX55je/mc997nPZY489pj0Oc0SsAgALzt13351169bl7LPPztFHHz3tcZhDYhUAWFA2b96cd7zjHTnooIOyYcOGaY/DHFsy7QEAAP5/nHfeedm0aVO+/e1vZ9myZdMehznmzCoAsGB85zvfyYYNG/KBD3wgRx555LTHYR6IVQBgQdi8eXPe+c535pBDDsn5558/7XGYJ2IVAFgQnnjiidx77725++67s3Tp0l/cCKCq8uEPfzhJ8p73vCdVlXPOOWe6wzJrXLMKACwIu+++e9797ndvd9+tt96aTZs25aijjsqhhx7qEoGdiFgFABaEZcuWzXg71fXr12fTpk059dRTc/rpp8/zZMwllwEAANCWWAUAoC2xCgAseOvXr88YwyUAOyGxCgBAW2IVAIC2xCoAAG356CoAYMFYte6aF9z/4AUnztMkzBdnVgEAaEusAgDQllgFAKAtsQoAQFtiFQCAtsQqAABtiVUAANoSqwAAtCVWAQBoS6wCANCWWAUAoC2xCgBAW2IVAIC2xCoAAG2JVQAA2hKrAAC0JVYBAGhLrAIA0JZYBQCgLbEKAEBbYhUAgLbEKgAAbYlVAADaEqsAALQlVgEAaEusAgDQllgFAKAtsQoAQFtiFQCAtsQqAABtiVUAANoSqwAAtCVWAQBoS6wCANCWWAUAoC2xCgBAW2IVAIC2xCoAAG2JVQAA2hKrAAC0JVYBAGhLrAIA0JZYBQCgLbEKAEBbYhUAgLbEKgAAbYlVAADaEqsAALQlVgEAaEusAgDQllgFAKAtsQoAQFtiFQCAtsQqAABtiVUAANoSqwAAtCVWAQBoS6wCANCWWAUAoC2xCgBAW2IVAIC2xCoAAG2JVQAA2hKrAAC0JVYBAGhLrAIA0JZYBQCgLbEKAEBbYhUAgLbEKgAAbYlVAADaEqsAALQlVgEAaEusAgDQllgFAKAtsQoAQFtiFQCAtsQqAABtiVUAANoSqwAAtCVWAQBoS6wCANCWWAUAoC2xCgBAW2IVAIC2xCoAAG2JVQAA2hKrAAC0JVYBAGhLrAIA0JZYBQCgLbEKAEBbYhUAgLbEKgAAbYlVAADaEqsAALQlVgEAaEusAgDQllgFAKAtsQoAQFtiFQCAtsQqAABtiVUAANoSqwAAtCVWAQBoS6wCANCWWAUAoC2xCgBAW2IVAIC2xCoAAG2JVQAA2hKrAAC0JVYBAGhLrAIA0JZYBQCgLbEKAEBbYhUAgLbEKgAAbYlVAADaEqsAAIvEj370o1xyySU5+eST86pXvSrLli3LXnvtlaOOOiqXXnppnn766WmP+DxLpj0AAADz4/LLL8/73ve+HHDAATnmmGNy0EEH5bHHHstVV12V008/PV/72tdy+eWXp6qmPeoviFUAgEXikEMOycaNG3PiiSdml12efYF9w4YNWbNmTa688spcddVVOeWUU6Y45bZcBgAAsEi88Y1vzEknnbRNqCbJ/vvvnzPOOCNJcsMNN0xhspmJVQAAsttuuyVJlizp9cK7WAUAWOQ2b96cz3/+80mS448/fsrTbEusAgAscuvWrcudd96ZE044Iccdd9y0x9mGWAUAWMQuvvjifPzjH89hhx2WL3zhC9Me53nEKgDAIvWpT30qZ599dl796lfn+uuvz7777jvtkZ5HrAIALEIXXXRRzjrrrLzmNa/J9ddfn/3333/aI22XWAUAWGQ++tGP5v3vf38OP/zwXH/99Xn5y18+7ZFmJFYBABaR888/P+vWrcsRRxyR6667LitWrJj2SC+o1wdpAQAwZy677LKce+652XXXXbN27dpcfPHFz3vOqlWrctppp83/cDMQqwAAi8QDDzyQJHnqqady0UUXbfc5b3jDG1rFqssAAAAWifXr12eM8YIPt1sFAIC/J7EKAEBbYhUAgLa8wQoAYBFZte6aGfc9eMGJ8zjJ348zqwAAtCVWAQBoS6wCANCWWAUAoC2xCgBAW2IVAIC2xCoAAG2JVQAA2hKrAAC0JVYBAGhLrAIA0JZYBQCgLbEKAEBbYhUAgLbEKgAAbYlVAADaEqsAALQlVgEAaEusAgDQllgFAKAtsQoAQFtiFQCAtsQqAABtiVUAANoSqwAAtCVWAQBoS6wCANCWWAUAoC2xCgBAW2IVAIC2xCoAAG2JVQAA2hKrAAC0JVYBAGhLrAIA0JZYBQCgLbEKAEBbYhUAgLbEKgAAbYlVAADaEqsAALQlVgEAaEusAgDQllgFAKAtsQoAQFtiFQCAtsQqAABtiVUAANoSqwAAtCVWAQBoq2WsXnHFFTnrrLOydu3a7LnnnqmqvP3tb5/2WAAAc0L7zGzJtAfYno985CO5/fbbs3z58qxcuTL33HPPtEcCAJgz2mdmLc+sfuITn8i9996bn/zkJ/n0pz897XEAAOaU9plZyzOrxxxzzLRHAACYN9pnZi3PrAIAQCJWAQBoTKwCANCWWAUAoC2xCgBAW2IVAIC2xCoAAG2JVQAA2mp5U4Crr746V199dZLk0UcfTZLcfPPNOe2005IkK1asyMc+9rEpTQcAMLu0z8xaxuptt92Wyy67bJtt999/f+6///4kycEHH7xo/4MBADsf7TOzlpcBrF+/PmOMGR8PPvjgtEcEAJg12mdmLWMVAAASsQoAQGNiFQCAtlq+wSpJVq27ZsZ9D15w4jxOAgAwt3TPzJxZBQCgLbEKAEBbYhUAgLbEKgAAbYlVAADaEqsAALQlVgEAaEusAgDQllgFAKAtsQoAQFtiFQCAtsQqAABtiVUAANoSqwAAtCVWAQBoS6wCANCWWAUAoC2xCgBAW2IVAIC2xCoAAG2JVQAA2hKrAAC0JVYBAGhLrAIA0JZYBQCgLbEKAEBbYhUAgLbEKgAAbYlVAADaEqsAALQlVgEAaEusAgDQllgFAKAtsQoAQFtiFQCAtsQqAABtiVUAANoSqwAAtCVWAQBoS6wCANCWWAUAoC2xCgBAW2IVAIC2xCoAAG2JVQAA2hKrAAC0JVYBAGhLrAIA0JZYBQCgLbEKAEBbYhUAgLbEKgAAbYlVAADaEqsAALQlVgEAaEusAgDQllgFAKAtsQoAQFtiFQCAtsQqAABtiVUAANoSqwAAtCVWAQBoS6wCANCWWAUAoC2xCgBAW2IVAIC2xCoAAG3VGGPmnVV/mmTF/I0zoxVJ/mraQwBtOCYAz5jP44G15tZfjTGOf+7GF4zVLqrqv48xfnXacwA9OCYAz5jP44G1psNlAAAAtCVWAQBoa6HE6memPQDQimMC8Iz5PB5YawoWxDWrAAAsTgvlzCoAAItQ+1itqr2r6oqquqeq7q6qI6c9EzA/qmppVd1SVbdX1Z9X1Ycn2/+wqh6oqtsmj8OnPCowD16oCarqA1U1qmpWPnJze2tV1R9tddx5sKpum4V1Dt3q97ytqn5SVedU1b5VdW1V3Tf5dZ85XOv8qrpjsu0bVfWKHV1rNrW/DKCqLkty0xjjkqp6SZKXjjH+95THAuZBVVWSPcYYT1TVbkm+neTsJGck+ZMxxhVTHRCYVzM1QVUdmOSSJIclOWKMscOfGfpi/VFVH0/y4zHGeTu61la/565JfpDk15P8qyR/Pca4oKrWJdlnjPHv5mitvxlj/GSy/V8nefUY44zZWmtHtT6zWlV7JTk6yaVJMsb4v0IVFo+xxROTb3ebPHr/hA3MiRdpgk8k+WBm6fjwYv0x+UH6d5P859lYbyvHJvmfY4y/TPKWJJdNtl+W5Lfmaq1nQnVijzQ7zraO1SSvTPJ4ks9V1aaquqSq9pj2UMD8qapdJy+1/TDJtWOM70x2/f7kZatPVNXu05sQmCfbbYKqekuSH4wxbp/rtbbavzbJY2OM+2ZxzST5vTwbwL80xnhk8vWjSX5pDtdKVf1+VT2U5J8nOXeW19oh3WN1SZLXJfn0GGN1kr9Nsm66IwHzaYzx1Bjj8CQrk6ypqtck+VC2vNz3a0n2TTJrL40BbW2vCdYn+feZ/bh6sf54W2b5rOrkUoM3J7n8ufvGlms2Z+1s5/bWGmP8hzHGgUm+lOTM2VprNnSP1YeTPLzVmZQrsuV/HmCRmbwEd32S48cYj0wuEXgyyeeSrJnqcMB8mKkJXpnk9qp6MFt+qL21qvafo7VSVUuS/HaSP9rBNZ7rN5PcOsZ4bPL9Y1V1wGTNA7Ll1aW5WmtrX0pyyiyutcNax+oY49EkD1XVoZNNxya5a4ojAfOoqvarqr0nXy9L8s+S3LPVAbyy5TquO6c1IzA/ZmiCW8cYLx9jrBpjrMqWyHzd5LmzvdYz/fFPk9wzxnh4R9bYjueerd2Y5NTJ16cm+S9ztVZV/fJW+96S5J5ZXGuHLYRPAzg8W97h95Ik9yd51xjjb6Y6FDAvqupXsuWNBbtmyw/XfzzGOK+qvplkvySV5LYkZ2z1RixgJ/ViTTA5u/qrs/RpANtdq6r+MMmfjTH+046usdVaeyT5fpJ/OMb48WTby5L8cZKDkvxlkt8dY/z1HK11ZZJDkzw9WeuMMcYPdnSt2dI+VgEAWLxaXwYAAMDiJlYBAGhLrAIA0JZYBQCgLbEKAEBbYhXgOarqt6pqVNVh054FYLETqwDP97Yk3578OhWTu+QALHpiFWArVbU8yVFJ3p3k9ybbfqOqbqiqK6rqnqr60uTuWamqC6rqrqq6o6o+VlW7VtUDtcXeVfVUVR09ee6NVfXLVbVHVf1BVd1SVZuq6i2T/adV1cbJTQ+um86/AYBe/OQOsK23JPnTMca9VfWjqjpisn11kn+U5H8l+W9J/klV3Z3k5CSHjTFGVe09xniqqv4iyauz5Z7ltyZZW1XfSXLgGOO+qtqQ5JtjjH8xuZ3sLVX1XyfrvC7Jr8zGnWoAdgbOrAJs621Jvjz5+st59lKAW8YYD48xns6WW7yuSvLjJH+X5NKq+u0k/2fy3JuSHD15/MdsOVP7a0m+O9n/piTrquq2JDckWZott1RMkmuFKsCznFkFmKiqfZO8Mclrq2ok2TXJSHJNkie3eupTSZaMMTZX1Zokxyb5nSRnTv75G5O8L8krkpyb5N8m+Y1sidgkqSSnjDH+4jnr/3qSv52TPxzAAuXMKsCzfifJF8YYB48xVo0xDkzyQJK123vy5PrWvcYYX03y/iT/eLLrliSvT/L0GOPvsuVM7L/MlohNkq8nOWur615Xz9GfB2DBE6sAz3pbkq88Z9uVmflTAf5Bkj+pqjuy5dMD/k2SjDGeTPJQkj+bPO+myXP/x+T785PsluSOqvrzyfcAbEeNMaY9AwAAbJczqwAAtCVWAQBoS6wCANCWWAUAoC2xCgBAW2IVAIC2xCoAAG2JVQAA2vp/jdj4sUZoV2sAAAAASUVORK5CYII=\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "counts = count_digits(answers)\n", + "plot_counts(counts)" + ] + }, + { + "cell_type": "markdown", + "id": "661a1135-ac2d-4a49-a786-d04a7ba68b48", + "metadata": {}, + "source": [ + "We see that there is an important variabilty in the answers given by `text-davinci-003`. Depending on the number of samples taken, even self-consistency sampling may lead to the wrong result here." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "30ea0dfe-6c15-44f0-881c-88b325542b44", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Entropy: 1.5741030017371853\n" + ] + } + ], + "source": [ + "print(f\"Entropy: {entropy(counts)}\")" + ] + }, + { + "cell_type": "markdown", + "id": "0b15b230-b667-4c9c-8a5d-366dd61de9b7", + "metadata": {}, + "source": [ + "## `text-davinci-003` on an easier question" + ] + }, + { + "cell_type": "markdown", + "id": "beae30f0-4168-4a80-90d4-d26a4f476469", + "metadata": {}, + "source": [ + "Let us now look at the results for an arguably easier question:" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "7e106b94-2dfd-4a75-b4d9-b1ad693418a7", + "metadata": {}, + "outputs": [], + "source": [ + "question = \"John buys 2 pairs of shoes for each of his 3 children. They cost $60 each. How much did he pay?\"\n", + "prompt = gsm8k_prompt(question)\n", + "answers = model(question, samples=20)" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "dd46fb2b-08ef-4003-8d03-ea0f39c865c4", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Entropy: 0.1985152433458726\n" + ] + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAqsAAAHgCAYAAACCbCTDAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAAsTAAALEwEAmpwYAAAQuklEQVR4nO3db6ye9V3H8c+PPxGxIjMVghSLkCYgOtmUzSCd6xii8mCIyxBCTBUTWbBmIzEhWeIAUfegSYkha6JIQsmSRaoclnUbmeCyQbrBoi2GrtSlEIZKxnQOCmPB9vLBuaH8aUui4dyfwuuVnPQ+13Wd3N/TR+/++ruva0zTFAAAaHTEvAcAAICDEasAANQSqwAA1BKrAADUEqsAANQSqwAA1Drqdc67rxUAAEthHOiglVUAAGqJVQAAaolVAIA3sc2bN2fdunVZvXp1jjvuuIwxcsUVVxz0+meeeSYf+9jHcsYZZ+SYY47J2972tlx44YW55557lnDq/V5vzyoAAIexG2+8Mdu3b8+yZcuyYsWK7Ny586DXfve73815552XHTt25KyzzspVV12VPXv25K677sr73//+3HLLLbnyyiuXcHorqwAAb2obNmzIrl278vTTT2fjxo2HvPa6667Ljh07cskll2Tbtm256aabcsstt+Thhx/OKaecknXr1uWJJ55YoskXiVUAgDexNWvWZNWqVRnjgB+2f4U777wzSXLDDTfkqKP2/wf8CSeckGuuuSbf//73c+utt75hsx6IWAUAIEny5JNPJklOO+2015x78dhS710VqwAAJEmWL1+eJHn00Udfc2737t1JkkceeWRJZxKrAAAkSS666KIkycc//vHs3bv3peNPPfVUNmzYkGTxQ1hLyd0AAABIsrhX9e67787mzZtz9tln5/zzz8+zzz6bu+66KyeffHIef/zxHHHE0q51WlkFACBJctJJJ+XBBx/M1VdfnWeeeSaf/OQns2XLllx66aW54447kix+2GopWVkFAOAlJ554Ym6++ebcfPPNrzh+7733JknOOeecJZ3HyioAAK9r06ZNSZLLL798Sd9XrAIAkCTZt29f9uzZ85rjt99+ezZt2pRzzz03F1988ZLOZBsAAMCb2MLCQhYWFpLsv4/q1q1bs3bt2iSLt6tav359kuS5557LiSeemAsuuCCnn356jjjiiNx///3ZunVrzjzzzNxxxx1L/gGrMU3Toc4f8iQAAN2uu+66XH/99Qc9v3Llyjz22GNJkhdeeCFXXXVV7rvvvpceq7pq1ap86EMfykc+8pEce+yxb+SoB3zEllgFAKDBAWPVnlUAAGqJVQAAaolVAABqiVUAAHLqtVvmPcIBiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGrVxOrmzZuzbt26rF69Oscdd1zGGLniiivmPRYAAHN01LwHeNGNN96Y7du3Z9myZVmxYkV27tw575EAAJizmpXVDRs2ZNeuXXn66aezcePGeY8DAECBmpXVNWvWzHsEAADK1KysAgDAq4lVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABq1TwUYGFhIQsLC0mSJ598MkmydevWrF27NkmyfPnyrF+/fk7TAQAwDzWxum3bttx2222vOLZ79+7s3r07SbJy5UqxCgDwFjOmaTrU+UOeBADgzeHUa7fksU9cNM8RxoEO2rMKAEAtsQoAQC2xCgBArdpYPfXaLfMeAQCAOauNVQAAEKsAANQSqwAA1BKrAADUEqsAANQSqwAA1BKrAADUEqsAANQSqwAA1BKrAADUEqsAANQSqwAA1BKrAADUEqsAANQSqwAA1BKrAADUEqsAANQSqwAA1BKrAADUEqsAANQSqwAA1BrTNB385BhfSLJ86cZ5heVJvjOn9wYAeCuaZ399Z5qmX3v1wUPG6jyNMb4+TdMvznsOAIC3isb+sg0AAIBaYhUAgFrNsfpX8x4AAOAtpq6/avesAgBA88oqAABvcXWxOsY4ZozxwBhj+xjj4THG9fOeCQDgcHewxhqL/myMsWuM8Y0xxh+97PhfjjG+OcZ4aIzxznnMfdQ83vR1/CDJ+6Zp2jPGODrJfWOMz0/T9NV5DwYAcBg7YGMlOTPJKUnOmKZp3xjjhNn1v55k1ezr3Uk2zv5cUnWxOi1uot0z+/bo2ZeNtQAA/w+HaKwPJ7l8mqZ9s+u+PbvmA0k2zX7uq2OM48cYJ03T9B9LOXfdNoAkGWMcOcbYluTbSb44TdPX5jwSAMBh7yCNdXqSS8cYXx9jfH6MsWp2+clJvvWyH39idmxJVcbqNE17p2k6O8mKJO8aY/zsnEcCADjsHaSxfijJ87MnV/11klvnOOJrVMbqi6Zp+u8k/5jkNc+JBQDg/+ZVjfVEkr+fnbozydtnr/8ti3tZX7RidmxJ1cXqGOMnxhjHz17/cJILkuyc61AAAIe5QzTWQpI1s8t+Jcmu2evPJPmd2V0BfinJ95Z6v2pS+AGrJCcluW2McWQWY/pvp2n67JxnAgA43B2wscYY9yX51Bjjo1n8ANbvz67/XJLfSPLNJM8l+d05zOwJVgAA9KrbBgAAAC8SqwAA1BKrAADUEqsAANQSqwAA1BKrAK8yxrh4jDGNMc6Y9ywAb3ViFeC1Lkty3+zPuRhjNN4HG2DJiVWAlxljLEtyXpIrk/z27Nh7xxhfGmNsHmPsHGN8aowxZuc+McbYMcZ4aIyxfoxx5Bjj0dkTX44fY+wdY7xndu2Xxxirxhg/Msa4dYzxwBjjn8cYH5idXzvG+MwY494k98znbwCgi3+5A7zSB5J8YZqmXWOM/xxj/MLs+DuSnJXk35Pcn+SXxxjfSPKbSc6YpmkaYxw/TdPeMcYjSX4myU8n+ackq8cYX0tyyjRN/zrG+PMk907T9HuzRx8+MMb4h9n7vDPJ26dp+q+l+oUBmllZBXily5J8evb609m/FeCBaZqemKZpX5JtSU5N8r0kzyf5mzHGJVl8HGGSfCXJe2Zff5HFldpzkjw4O/+rSa4dY2xL8qUkxyT5qdm5LwpVgP2srALMjDF+PMn7kvzcGGNKcmSSKcmWJD942aV7kxw1TdP/jDHeleT8JB9M8oezn/9ykg8n+ckkf5Lkj5O8N4sRmyQjyW9N0/TIq97/3UmefUN+OYDDlJVVgP0+mOT2aZpWTtN06jRNpyR5NMnqA10829/6Y9M0fS7JR5P8/OzUA0nOTbJvmqbns7gS+wdZjNgkuTvJupfte33HG/T7ABz2xCrAfpclufNVx/4uB78rwI8m+ewY46Es3j3gmiSZpukHSb6V5Kuz674yu/ZfZt//aZKjkzw0xnh49j0ABzCmaZr3DAAAcEBWVgEAqCVWAQCoJVYBAKglVgEAqCVWAQCoJVYBAKglVgEAqCVWAQCo9b+lzUDoz9UHogAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "counts = count_digits(answers)\n", + "plot_counts(counts)\n", + "print(f\"Entropy: {entropy(counts)}\")" + ] + }, + { + "cell_type": "markdown", + "id": "cf4cacdf-a31d-43bd-8517-eec9f656eee4", + "metadata": {}, + "source": [ + "The entropy of the results is much lower, we say that the model is more \"certain\" of its answers. " + ] + }, + { + "cell_type": "markdown", + "id": "22f31872-aab7-4a68-b9f2-d335a4f1a875", + "metadata": {}, + "source": [ + "## How `gpt-4` compares to `text-davinci-003`\n", + "\n", + "Let us now look at how GPT4 fares on the original question:" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "2d5ab5b8-eca5-47f5-a35c-5f3865e35755", + "metadata": {}, + "outputs": [], + "source": [ + "model = models.text_completion.openai(\"gpt-4\", max_tokens=128)\n", + "\n", + "question = \"When I was 6 my sister was half my age. Now I’m 70 how old is my sister?\"\n", + "prompt = gsm8k_prompt(question)\n", + "answers = model(prompt, samples=20)" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "d316a5f7-cebc-4b09-9b1b-aee219b2f088", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Entropy: -0.0\n" + ] + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAqwAAAHgCAYAAABgsD+6AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAAsTAAALEwEAmpwYAAAQI0lEQVR4nO3dYahf9X3H8c+vua6J3Ui4hKKDOpVMxTmn1Tlwatt0rKKUziZgfJa5B2tg0TkY5EEZLepWYYITwTJ0o5Wi08ypXYdjMxG76OpkLc51jQUdU4ZiHaagREg8e5C/0WhiNWlyP7l9vSDk/s/v/Lnffx6Ed05+95wxTVMAAKDVhxZ6AAAAeC+CFQCAaoIVAIBqghUAgGqCFQCAaoIVAIBqcz9h3T2vAAA4EsaBFlxhBQCgmmAFAKCaYAV4H15++eXcdtttueyyy7Jq1aosW7Ysy5cvzwUXXJDbb789b7zxxn7f9+ijj+aSSy7J/Px8li1bljPPPDM33XRTdu/efYQ/AcDRa/yER7PawwqQ5Ktf/Wo2bNiQ448/Pp/61Kdywgkn5MUXX8y9996bHTt2ZM2aNbnnnnsyxltbsO6///6sWbMmS5cuzeWXX575+fl885vfzPbt27N27drcc889C/iJAOoccA+rYAV4H7Zs2ZJXX301l156aT70obf+c+qFF17Ieeedl+eeey6bN2/OmjVrkiQ//vGPs2rVquzYsSPbtm3LueeemyTZuXNnVq9encceeyx33nln1q1btyCfB6CQH7oCOBSrV6/OZz/72X1iNUmOO+64fOELX0iSPPzww3uPb968OS+99FLWrVu3N1aTZOnSpbnuuuuSJLfeeuvhHxxgERCsAIfomGOOSZLMzb11p8AtW7YkSS6++OJ3nX/RRRfl2GOPzaOPPprXX3/9yAwJcBQTrACHYNeuXfn617+eZN843b59e5LklFNOedd75ubmctJJJ2XXrl155plnjsygAEcxwQpwCDZt2pSnnnoql1xyST7zmc/sPb5jx44kyfLly/f7vjePv/LKK4d9RoCjnWAFOEg333xzbrzxxpx22mm54447FnocgEVLsAIchFtuuSVXX311Tj/99GzdujXz8/P7rL95BfXNK63v9ObxFStWHNY5ARYDwQrwAd10003ZuHFjzjjjjGzdujXHHXfcu8459dRTkyRPP/30u9Z27dqVZ599NnNzczn55JMP+7wARzvBCvAB3HDDDbnmmmty1llnZevWrfnoRz+63/NWr16dJHnwwQfftfbII4/ktddey/nnn58Pf/jDh3VegMVAsAK8T9dee202bdqUc845Jw899FBWrlx5wHPXrl2blStX5q677soTTzyx9/jOnTvzxS9+MUmyYcOGwz4zwGLgSVcA78PXvva1rF+/PkuWLMnGjRv3+9P/J554YtavX7/39X333Ze1a9dm6dKlWbduXebn5/PAAw/sfTTr3Xffvc+jXAF+xnk0K8Ch+NKXvpQvf/nL73nOJz7xiX2edpUk27Zty/XXX5/HHnssO3fuzKpVq3LllVfmqquuypIlSw7jxABHHcEKAEC1AwarPawAAFQTrAAAVBOsAABUm1voAQ7kxE3fWugRAAB+pvz3Vy5d6BH2yxVWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKqNaZoOvDjGg0lWHrlxABaFlUl+tNBDABxlfjRN08X7W3jPYAXggxtjPDFN07kLPQfAYmFLAAAA1QQrAADVBCvAT99fLvQAAIuJPawAAFRzhRUAgGpzCz0AwNFsjLEiyW1JzkgyJbkyyR8mOXV2yookr0zTdNaRnw5gcRCsAIfmL5I8OE3T2jHGzyU5dpqmy99cHGPcmGTHgk0HsAjYwwpwkMYYy5N8L8nJ037+Mh1jjCT/k2T1NE0/PMLjASwa9rACHLyTkryU5K/HGN8dY9w2xvjI29YvTPKiWAU4NIIV4ODNJfl4klunaTo7yatJNr1t/Yokdy7EYACLiWAFOHjPJ3l+mqbvzF5vzp6AzRhjLsnnk/zNAs0GsGgIVoCDNE3TC0meG2O8eUeATyf5/uzr30ryg2manl+Q4QAWEXcJADg0G5N8Y3aHgGeS/O7s+LrYDgDwU+EuAQAAVLMlAACAaoIVAIBqghUAgGqCFQCAaoIVAIBqghXgHcYYvzPGmMYYpy30LAAIVoD9uSLJv8x+XxCzJ2UBEMEKsI8xxs8nuSDJ72XPzf8zxvjkGOPhMcbmMcYPxhjfGGOM2dpXxhjfH2M8Ocb48zHGkjHGs2OPFWOM3WOMi2bnPjLG+OUxxkfGGH81xnh8jPHdMcbnZuvrxxgPjDG2JHloYf4EAPr4FzzAvj6X5MFpmp4eY7w8xjhndvzsJL+S5H+TbEvym2OM/0pyWZLTpmmaxhgrpmnaPcbYnuT0JCcl+fckF44xvpPkY9M0/XCM8adJtkzTdOUYY0WSx8cY/zz7Ph9PcuY0Tf93pD4wQDtXWAH2dUWSu2Zf35W3tgU8Pk3T89M0vZHke0lOTLIjyc4kt48xPp/ktdm5305y0ezXn2XPFdtfT/Jvs/XfTrJpjPG9JA8nWZrkhNnaP4lVgH25wgowM8aYT7I6ya+OMaYkS5JMSb6V5PW3nbo7ydw0TbvGGOcl+XSStUn+YPb+R5JsSPKLSf4kyR8n+WT2hGySjCRrpmna/o7v/xtJXj0sHw7gKOYKK8Bb1ia5Y5qmX5qm6cRpmj6W5NkkF+7v5Nl+1+XTNP1DkmuS/Nps6fEk5yd5Y5qmndlzRfb3sydkk+Qfk2x82z7Ysw/T5wFYFAQrwFuuSPJ37zj2tznw3QJ+IcnfjzGezJ67CvxRkkzT9HqS55L86+y8b8/O/Y/Z62uTHJPkyTHGf85eA3AAY5qmhZ4BAAAOyBVWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKr9P8bb7HZA9fu3AAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "counts = count_digits(answers)\n", + "plot_counts(counts)\n", + "print(f\"Entropy: {entropy(counts)}\")" + ] + }, + { + "cell_type": "markdown", + "id": "2f6c8a22-fdf5-4f30-865c-8e11927b1b7c", + "metadata": {}, + "source": [ + "GPT4 returns the correct answer with certainty." + ] + }, + { + "cell_type": "markdown", + "id": "50d4a55e-86df-46ab-8b38-302c79bc8add", + "metadata": {}, + "source": [ + "## Conclusion\n", + "\n", + "When generating text completions with a language model we typically look at one output sample, trying to find the \"right\" answer. However, doing so we obscure the diversity of answers that these language models can produce. Assuming the diversity of answers reflects these models' \"uncertainty\", we can use measures such as the entropy of the answers' distribution to evaluate the quality of the answer.\n", + "\n", + "Which result should we be choosing once we have different samples? There is no definite answer to this question. The [self-consistency method](https://arxiv.org/abs/2203.11171) consists in choosing the result based on a majority vote. We think this choice is arbitrary and that choosing the correct answer is a [decision theory](https://en.wikipedia.org/wiki/Decision_theory) problem, which can only be solved by specifying a loss function that is adapted to the experiment's context; the majority vote being a particular case with a 0-1 loss." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.16" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/examples/simulation_based_inference.ipynb b/examples/simulation_based_inference.ipynb new file mode 100644 index 000000000..e6b999582 --- /dev/null +++ b/examples/simulation_based_inference.ipynb @@ -0,0 +1,370 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "e7c7d0bb-8d45-4139-a584-02c7196db92b", + "metadata": {}, + "source": [ + "# Find the best few-shot examples using simulation-based inference" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "831a76f5-c569-4174-adab-fb0245877367", + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "import random\n", + "import requests\n", + "import re\n", + "\n", + "import outlines.models as models\n", + "import outlines.text as text\n", + "\n", + "random.seed(0)" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "ec604edc-c8b6-4088-bf17-b77ae57d05a1", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "env: OPENAI_API_KEY=# your key here\n" + ] + } + ], + "source": [ + "%env OPENAI_API_KEY = # your key here" + ] + }, + { + "cell_type": "markdown", + "id": "aabb4db6-fd94-4c42-ab7f-97c3de45b2cc", + "metadata": {}, + "source": [ + "In this example we will use GPT 3.5 to solve problems from the GSM-8K dataset. The state-of-the-art performance on this dataset is obtained using few-shot prompting with 5 examples. However, it is not clear how one should select these examples. Here, we will use **simulation-based inference** to try to infer which examples we should be using to get the best out of the model's abilities to solve the problem.\n", + "\n", + "Let's start with downloading the dataset:" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "367f5f89-8e5d-4381-b9eb-78c60bc50f86", + "metadata": {}, + "outputs": [], + "source": [ + "result = requests.get(\"https://raw.githubusercontent.com/openai/grade-school-math/master/grade_school_math/data/train.jsonl\")\n", + "lines = result.iter_lines()" + ] + }, + { + "cell_type": "markdown", + "id": "ef0f7aa9-d528-41e9-8a9d-4497f01f0692", + "metadata": {}, + "source": [ + "We now divide the train set in two sets:\n", + "- 20 problems from which we are going to sample 5 examples at random for every inference;\n", + "- 500 problems which we are going to use to perform inference." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "0667c4a8-cebe-4796-bbc9-575ee9498717", + "metadata": {}, + "outputs": [], + "source": [ + "example_set = []\n", + "for _ in range(10):\n", + " line = json.loads(next(lines))\n", + " answer = re.findall(r\"\\d+\", line[\"answer\"])[-1]\n", + " example_set.append({\"question\": line[\"question\"], \"answer\": answer})\n", + "\n", + "train_set = []\n", + "for _ in range(500):\n", + " line = json.loads(next(lines))\n", + " answer = re.findall(r\"\\d+\", line[\"answer\"])[-1]\n", + " train_set.append({\"question\": line[\"question\"], \"answer\": answer})" + ] + }, + { + "cell_type": "markdown", + "id": "4b52b470-d818-495a-a6e3-e50a1deff13c", + "metadata": {}, + "source": [ + "Now let's define the prompt, the model, and the sampling loop. The sampling loop consists in choosing 5 examples at random, sampling 20 model answers; if the answer is correct we keep the example ids as samples, otherwise continue:" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "9fbebaa9-f05e-4c6b-8875-73a08273bbb5", + "metadata": {}, + "outputs": [], + "source": [ + "@text.prompt\n", + "def few_shots(question, examples):\n", + " \"\"\"\n", + " {% for example in examples %}\n", + " Q: {{ example.question }}\n", + " A: {{ example.answer }}\n", + " {% endfor %}\n", + " Q: {{ question }}\n", + " A:\n", + " \"\"\"\n", + "\n", + "model = models.text_completion.openai(\"text-davinci-003\", max_tokens=128)\n", + "\n", + "# TODO: This could largely benefit from vectorization in #52\n", + "def one_train_example(problem, example_set):\n", + " example_ids = random.choices(range(0, len(example_set)), k=5)\n", + " examples = [example_set[i] for i in example_ids]\n", + " prompt = few_shots(problem[\"question\"], examples)\n", + " answers_raw = model(prompt, samples=20)\n", + "\n", + " samples = []\n", + " for answer_raw in answers_raw:\n", + " try:\n", + " answer = re.findall(r\"\\d+\", answer_raw)[-1]\n", + " if answer == problem[\"answer\"]:\n", + " samples += example_ids\n", + " else:\n", + " continue\n", + " except IndexError:\n", + " pass\n", + "\n", + " return samples" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "1dae1ef2-c9e0-4c98-8686-7fbc2ff55e56", + "metadata": {}, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "9efc9d077af24a2eb5ea3c05fe63f298", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + " 0%| | 0/500 [00:00" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "import numpy as np\n", + "import matplotlib.pylab as plt\n", + "\n", + "example_ids, counts = np.unique(samples, return_counts=True)\n", + "\n", + "fig = plt.figure(figsize=(12,8))\n", + "ax = fig.add_subplot(111)\n", + "ax.bar(example_ids, counts)\n", + "\n", + "ax.spines[[\"top\", \"right\"]].set_visible(False)\n", + "\n", + "ax.set_xticks(range(10))\n", + "ax.set_xlabel(\"Example #\")\n", + "ax.set_ylabel(\"Counts\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "cde37e5b-377e-4872-af40-674d680bd2da", + "metadata": {}, + "source": [ + "Looking at the distribution, our best guess for which examples we should use for benchmarking on the test set would be 0, 1, 2, 6 and 9. This method can be trivially extended to other workflows that use few-shot examples to query LLMs. Of course, simulation-based inference extends beyong choosing the \"best\" prompt, and could for instance be useful to select the structure of chains of LLMs and tools as well." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "bddda20b-234a-4d30-b40a-90708fbaba23", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'question': 'Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?',\n", + " 'answer': '72'}" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "example_set[0]" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "fb186bf9-62b7-485f-a8ce-401f551a9e57", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'question': 'Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?',\n", + " 'answer': '10'}" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "example_set[1]" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "ae427bb2-e3f4-4a96-a508-e8011a0fc553", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'question': 'Betty is saving money for a new wallet which costs $100. Betty has only half of the money she needs. Her parents decided to give her $15 for that purpose, and her grandparents twice as much as her parents. How much more money does Betty need to buy the wallet?',\n", + " 'answer': '5'}" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "example_set[2]" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "fe43ae0f-c18f-4b74-b639-8481472edf4d", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'question': 'Albert is wondering how much pizza he can eat in one day. He buys 2 large pizzas and 2 small pizzas. A large pizza has 16 slices and a small pizza has 8 slices. If he eats it all, how many pieces does he eat that day?',\n", + " 'answer': '48'}" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "example_set[6]" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "19d9d936-d0f0-4927-990c-76dbbfa95b47", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'question': 'Tina makes $18.00 an hour. If she works more than 8 hours per shift, she is eligible for overtime, which is paid by your hourly wage + 1/2 your hourly wage. If she works 10 hours every day for 5 days, how much money does she make?',\n", + " 'answer': '990'}" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "example_set[9]" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.16" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}