-
Notifications
You must be signed in to change notification settings - Fork 1
/
03_few_shot_learning.py
358 lines (293 loc) · 12.4 KB
/
03_few_shot_learning.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
# Databricks notebook source
# MAGIC %md
# MAGIC
# MAGIC ## Few Shot Learning
# MAGIC <hr/>
# MAGIC <img src="https://promptengineeringdbl.blob.core.windows.net/img/header.png"/>
# MAGIC <hr/>
# MAGIC
# MAGIC In Machine Learning, **Few Shot Learning** (FSL) is a Machine Learning framework that enables a pre-trained model to generalize over new categories of data (that the pre-trained model has not seen during training) using only a few labeled samples per class. It falls under the paradigm of meta-learning (meta-learning means learning to learn).
# MAGIC
# MAGIC Within **Prompt Engineering**, FSL is also known as **Few Shot Prompting** - you include some examples related to the task that you want the model to perform as part of your prompts.
# MAGIC
# MAGIC ### Few Shot Learning Example
# MAGIC
# MAGIC It can be tricky for LLMs to do sentiment analysis when there are aspects in the prompt such as *ambiguity* or *irony*.
# MAGIC
# MAGIC #### Sentiment Classification: Zero-Shot Learning
# MAGIC
# MAGIC Suppose you tried **Zero Shot Learning** for a **Sentiment Analysis** problem with an LLM of your choice. You noticed that the performance is below what you expect:
# MAGIC
# MAGIC * **Instruction**: *Please classify the following sentence as either POSITIVE or NEGATIVE, depending on the overall sentiment: "This movie is good, considering that you think it's good to sit in the movie theater sleeping for 2 hours."*
# MAGIC * **LLM Answer**: POSITIVE (expected classification: NEGATIVE)
# MAGIC
# MAGIC #### Sentiment Classification: Few-Shot Learning
# MAGIC
# MAGIC You then try to provide some classification examples containing *irony*, so that the model is able to distinguish those and properly classify similar ones:
# MAGIC <br/>
# MAGIC <br/>
# MAGIC
# MAGIC * **Instruction**: *Given the following sentiment classification examples:*
# MAGIC <br/>
# MAGIC
# MAGIC *'I liked this movie as much as I like standing in the rain' (NEGATIVE)*
# MAGIC <br/>
# MAGIC
# MAGIC *'If you would like to spend 10 USD for nothing then this movie is good for you' (NEGATIVE)*
# MAGIC
# MAGIC *Please classify the following sentence as either POSITIVE, NEUTRAL or NEGATIVE, depending on the overall sentiment.*
# MAGIC <br/>
# MAGIC
# MAGIC *"This movie is good, considering that you think it's good to sit in the movie theater sleeping for 2 hours."*
# MAGIC <br/>
# MAGIC <br/>
# MAGIC * **LLM Answer**: NEGATIVE (expected classification: NEGATIVE)
# MAGIC
# MAGIC In the cells below, we will try to apply the same concept to an **intent classification** dataset. For this purpose, we will experiment with MosaicML's `mpt-7b-instruct` model.
# MAGIC
# MAGIC A potential caveat with Few Shot Learning is given all the few shot examples that we provide, our prompt might be larger than the maximum **context length** for a particular model.
# MAGIC
# MAGIC MPT family of models allows you to configure maximum context window size, which mitigates this issue. Of course, it is also possible to extend context window sizes by fine tuning LLMs, but we want to skip that step for this exercise.
# COMMAND ----------
# DBTITLE 1,Declaring and Configuring MPT-7b
import transformers
import torch
# it is suggested to pin the revision commit hash and not change it for reproducibility because the uploader might change the model afterwards; you can find the commmit history of mpt-7b-instruct in https://huggingface.co/mosaicml/mpt-7b-instruct/commits/main
name = "mosaicml/mpt-7b-instruct"
config = transformers.AutoConfig.from_pretrained(
name,
trust_remote_code=True
)
config.attn_config['attn_impl'] = 'triton'
config.init_device = 'cuda'
config.max_seq_len = 4000
model = transformers.AutoModelForCausalLM.from_pretrained(
name,
config=config,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
cache_dir="/local_disk0/.cache/huggingface/",
revision="bbe7a55d70215e16c00c1825805b81e4badb57d7"
)
tokenizer = transformers.AutoTokenizer.from_pretrained(
"EleutherAI/gpt-neox-20b",
padding_side="left"
)
generator = transformers.pipeline(
"text-generation",
model=model,
config=config,
tokenizer=tokenizer,
torch_dtype=torch.bfloat16,
device=0
)
# COMMAND ----------
# MAGIC %md
# MAGIC
# MAGIC ## Output Formats
# MAGIC
# MAGIC Different LLMs have distinct behaviours when it comes to the output format.
# MAGIC
# MAGIC For instance, we might want to have our LLM return JSON-formatted outputs, so that we can easily transform these outputs into a Pandas or Spark DataFrame for persisting them somewhere later on and also evaluating the outputs.
# MAGIC
# MAGIC To simplify this, we will parse our LLM responses to JSON. To achieve this, we'll leverage [JSONFormer](https://github.com/1rgs/jsonformer), a nice, lightweight Python Package that parses JSON payloads from LLM outputs.
# MAGIC
# MAGIC Sample usage:
# MAGIC
# MAGIC ```python
# MAGIC from jsonformer import Jsonformer
# MAGIC from transformers import AutoModelForCausalLM, AutoTokenizer
# MAGIC
# MAGIC model = AutoModelForCausalLM.from_pretrained("databricks/dolly-v2-12b")
# MAGIC tokenizer = AutoTokenizer.from_pretrained("databricks/dolly-v2-12b")
# MAGIC
# MAGIC json_schema = {
# MAGIC "type": "object",
# MAGIC "properties": {
# MAGIC "name": {"type": "string"},
# MAGIC "age": {"type": "number"},
# MAGIC "is_student": {"type": "boolean"},
# MAGIC "courses": {
# MAGIC "type": "array",
# MAGIC "items": {"type": "string"}
# MAGIC }
# MAGIC }
# MAGIC }
# MAGIC
# MAGIC prompt = "Generate a person's information based on the following schema:"
# MAGIC jsonformer = Jsonformer(model, tokenizer, json_schema, prompt)
# MAGIC generated_data = jsonformer()
# MAGIC
# MAGIC print(generated_data)
# MAGIC ```
# COMMAND ----------
# DBTITLE 1,Declaring our Generation Wrapper Function
from jsonformer import Jsonformer
def generate_text(prompt, **kwargs):
json_schema = {
"type": "object",
"properties": {
"utterance": {"type": "string"},
"intent": {"type": "string"}
}
}
if "max_new_tokens" not in kwargs:
kwargs["max_new_tokens"] = 100
kwargs.update(
{
"pad_token_id": tokenizer.eos_token_id,
"eos_token_id": tokenizer.eos_token_id,
}
)
if isinstance(prompt, str):
jsonformer = Jsonformer(model, tokenizer, json_schema, prompt)
response = jsonformer()
elif isinstance(prompt, list):
response = []
for prompt_ in prompt:
jsonformer = Jsonformer(model, tokenizer, json_schema, prompt_)
generated_text = jsonformer()
response.append(generated_text)
return response
# COMMAND ----------
# MAGIC %md
# MAGIC
# MAGIC ## The Dataset
# MAGIC <br/>
# MAGIC
# MAGIC * For this **intent classification** example, we will use a **customer support intent dataset** from Hugging Face.
# MAGIC * Once we have downloaded it, we will save it as Delta Table, so that we don't need to download it again. That will also allow us to have it in optimized format should we want to do any preprocessing, visualization etc.
# COMMAND ----------
# MAGIC %sql
# MAGIC
# MAGIC CREATE DATABASE IF NOT EXISTS prompt_engineering
# COMMAND ----------
# DBTITLE 1,Downloading our Dataset from Hugging Face and Saving to Dclta
import datasets
ds = datasets.load_dataset("bitext/customer-support-intent-dataset")
df = ds["train"].to_pandas()
sdf = spark.createDataFrame(df)
sdf.write.saveAsTable("prompt_engineering.customer_support_intent", mode = "overwrite")
#Take a small, random sample from this dataset and display it
display(df.sample(frac=0.2, random_state = 123).head(10))
# COMMAND ----------
# MAGIC %md
# MAGIC
# MAGIC ## Selecting our Few Shot Examples
# MAGIC <br/>
# MAGIC
# MAGIC * The dataset we're using contains multiple different intents.
# MAGIC * For our experiment to be minimally successful, we need to provide at least some few shot examples for each and every intent.
# MAGIC * For simplicity purposes, we will select the top 10 most frequent intents. Then, for each of these top 10 intents, we will randomly sample 3 different utterances/questions.
# MAGIC * Concretely, our dataframe containing few shot examples will be our **training set** (although we're not really *training* our model per se; this will just be our convention).
# MAGIC * The remaining part of our dataset will be our **testing set**, which we'll use to evaluate how our Few Shot experiment performs.
# COMMAND ----------
# DBTITLE 1,Creating a sampled dataset for Few Shot Examples
import pandas as pd
import numpy as np
def get_top_intents_samples(df, k = 10, random_state = 123, n_samples = 5):
df_arr = []
for intent in df.intent.unique()[:k]:
sample = df.loc[df.intent == intent, :].sample(
n = n_samples,
random_state = random_state
)
df_arr.append(sample)
df_sampled = pd.concat(df_arr, axis = 0)
return df_sampled.sample(frac = 1.0)
df_train = get_top_intents_samples(
df = df,
random_state = 123,
n_samples = 3
)
df_test = get_top_intents_samples(
df = df[~np.isin(df.index, df_train.index)],
random_state = 234,
n_samples = 5
)
# COMMAND ----------
# DBTITLE 1,Creating Prompts using Few Shot Learning
import tqdm
def train_predict(prompt_template, df_train, df_test):
sample_dict_arr = df_train.to_dict(orient="records")
examples = sample_dict_arr
utterances = df_test.utterance.values
intents = df_test.intent.values
utterances_intents = list(zip(utterances, intents))
result = []
for utterance_intent in tqdm.tqdm(utterances_intents):
formatted_prompt = prompt_template.format(
intents = df_train.intent.unique(),
examples = examples,
utterance = utterance_intent[0]
)
# Setting temperature = 0 and do_sample = False
# (to be as deterministic as possible)
response = generate_text(
prompt = formatted_prompt,
temperature = 0.1,
top_p = 0.15,
top_k = 5,
do_sample = True
)
response["actual_intent"] = utterance_intent[1]
result.append(response)
return result
prompt_template = """
Below you will find an array containing three examples of utterances for each of these intents. Each example is in JSON format:
{examples}
Based on the examples above, return a JSON object with the actual utterance, and the right intent for the utterance below:
'{utterance}'
"""
result = train_predict(prompt_template, df_train, df_test)
# COMMAND ----------
# DBTITLE 1,Analysing Results
from sklearn.metrics import classification_report
result_df = pd.DataFrame.from_dict(result)
report = classification_report(result_df.actual_intent, result_df.intent)
print(report)
# COMMAND ----------
result_df[result_df.actual_intent == "change_address"]
# COMMAND ----------
# MAGIC %md
# MAGIC
# MAGIC ### Results Evaluation
# MAGIC
# MAGIC <br/>
# MAGIC
# MAGIC * We were able to get a (weighted) F1 Score of 0.93, which is not bad at all!
# MAGIC * Still, for some classes/intents the score is really bad:
# MAGIC * `change_address`
# MAGIC * If we look again at both our training and testing sets we'll realise that these intents are not even there.
# MAGIC * Let's change our prompt, so that our model becomes more assertive in that sense.
# MAGIC * We'll surround our **utterances** with single quotes
# MAGIC * We'll also add a bit of text to try to make our generations more assertive
# COMMAND ----------
prompt_template = """
Below you will find an array containing three examples of utterances for each of these intents. Each example is in JSON format:
{examples}
Based on the examples above, return a JSON object with the actual utterance, and the right intent for the utterance below:
'{utterance}'
Make sure that your answer contains one of the intents below:
{intents}
"""
result = train_predict(prompt_template, df_train, df_test)
# COMMAND ----------
result_df = pd.DataFrame.from_dict(result)
report = classification_report(result_df.actual_intent, result_df.intent)
print(report)
# COMMAND ----------
# MAGIC %md
# MAGIC
# MAGIC * Of course, this look super good 😀
# MAGIC * We just need to have in mind that our testing set is quite limited - mainly for experimentation / latency purposes.
# MAGIC * In the next notebook we'll expand our testing set, and also introduce other techniques, such as **Chain of Thought (CoT)** and **Active Prompting**.
# COMMAND ----------
# MAGIC %md
# MAGIC
# MAGIC ## Reference
# MAGIC
# MAGIC <hr />
# MAGIC
# MAGIC * [Everything You Need to Know About Few-Shot Learning](https://blog.paperspace.com/few-shot-learning/)
# MAGIC * [Few-Shot Prompting](https://www.promptingguide.ai/techniques/fewshot)