minicons
allows for efficient and intuitive extraction of word/phrase representations using transformer models (in theory, any model that is available on the huggingface hub).
It does so by using a wrapper (in the cwe
module) around the AutoModel
class made available by the transformers package by HuggingFace.
For demonstration purposes, let's use the bert-base-uncased
model to extract contextual representations of the word aircraft from the list of sentences in the file samplesentences.txt
(listed in the same directory as this file).
This tutorial requires the following packages:
minicons
torch
transformers
import torch
from minicons import cwe
from minicons.utils import character_span # for demonstrating.
We will now load our sentences, each of which consists of the target word aircraft:
sentences = []
with open('samplesentences.txt', 'r') as f:
for line in f:
sentences.append(line.strip())
sentences[:2]
'''
['This aircraft works by jet propulsion.',
'His passion is making model aircraft.']
'''
len(sentences)
'''
11
'''
Every representation extraction model in minicons
is an instance of the cwe.CWE
class. CWE stands for 'contextual word embeddings.'
model = cwe.CWE('bert-base-uncased', device = 'cpu') # also allows gpus, use 'cuda:[NUMBER]' to do so.
The function primarily used for extracting representations from models is model.extract_representation()
. It accepts batches of instances represented in either of the following formats:
data = [
(sentence_1, word_1),
(sentence_2, word_2),
....
(sentence_n, word_n)
]
or
data = [
(sentence_1, (start_1, end_1)),
(sentence_2, (start_2, end_2)),
....
(sentence_n, (start_n, end_n))
]
where (start_i, end_i)
are the character span indices for the target word in the ith sentence, i.e., start_i
is the start index, and end_i
is the end index.
For example, the instance ["I like reading books.", (15, 20)]
corresponds to the word "books"
.
Regardless of what is specified, extract_representation()
reduces the input to the second format. For instance, to get the character span indices of aircraft in the first sentence:
character_span(sentences[0], 'aircraft')
'''
(5, 13)
'''
The first type indeed provides an easier method to prepare our data to extract representations:
instances = []
for s in sentences:
instances.append([s, 'aircraft'])
We can now use instances
as an input to model.extract_representation()
. By default, this method extracts representations from the last layer of the model:
model.extract_representation(instances)
'''
tensor([[ 0.4418, 0.2876, -0.4887, ..., -0.9296, 0.2270, 0.4052],
[ 0.6424, 0.3509, 0.1863, ..., -0.7924, 0.0086, -0.5835],
[-0.1713, -0.0228, -0.1329, ..., -0.8026, 0.6910, 0.0614],
...,
[ 0.1711, 0.3773, -1.2992, ..., -0.3187, 0.3004, -0.4013],
[ 0.6711, 0.0556, -1.1710, ..., -0.5866, 0.8467, 0.2816],
[ 0.5522, 0.1332, 0.2180, ..., -0.2806, 1.0105, -0.1799]])
'''
We can even choose a different layer:
model.extract_representation(instances, layer = 5)
'''
tensor([[ 1.1305, 1.2379, -0.3605, ..., -0.2164, 0.7634, 0.2290],
[ 1.5314, 1.1103, -0.3012, ..., 0.3013, 1.1243, -0.1035],
[ 0.5519, 0.3571, 0.1852, ..., -0.0317, 0.3467, -0.5793],
...,
[ 0.2921, 0.5046, -0.8121, ..., -0.0687, 0.5321, 0.0470],
[ 0.6099, 0.9541, -0.5739, ..., 0.1725, 0.2572, -0.0846],
[ 0.8634, 0.3579, -0.1031, ..., -0.4277, -0.0830, -0.3293]])
'''
or even specify multiple layers, which yields a list of torch tensors:
model.extract_representation(instances, layer = [11, 12])
'''
[tensor([[ 0.9413, 0.3149, 0.0279, ..., -1.2449, 0.5606, -0.0098],
[ 1.2197, 0.4842, 0.0792, ..., -1.3511, 0.3262, -0.8011],
[ 0.1097, -0.0434, 0.4395, ..., -1.2032, 1.0695, 0.1302],
...,
[ 0.5841, 0.3509, -0.4591, ..., -0.2502, 0.5510, -0.2269],
[ 0.8643, 0.2670, -0.8616, ..., -0.4080, 0.3723, -0.0150],
[ 0.9704, 0.2731, 0.0032, ..., -0.4349, 1.3013, -0.1415]]),
tensor([[ 0.4418, 0.2876, -0.4887, ..., -0.9296, 0.2270, 0.4052],
[ 0.6424, 0.3509, 0.1863, ..., -0.7924, 0.0086, -0.5835],
[-0.1713, -0.0228, -0.1329, ..., -0.8026, 0.6910, 0.0614],
...,
[ 0.1711, 0.3773, -1.2992, ..., -0.3187, 0.3004, -0.4013],
[ 0.6711, 0.0556, -1.1710, ..., -0.5866, 0.8467, 0.2816],
[ 0.5522, 0.1332, 0.2180, ..., -0.2806, 1.0105, -0.1799]])]
'''
# These can be pooled using the following code:
torch.stack(model.extract_representation(instances, layer = [11, 12])).mean(0)
'''
tensor([[ 0.6916, 0.3012, -0.2304, ..., -1.0873, 0.3938, 0.1977],
[ 0.9310, 0.4175, 0.1328, ..., -1.0717, 0.1674, -0.6923],
[-0.0308, -0.0331, 0.1533, ..., -1.0029, 0.8803, 0.0958],
...,
[ 0.3776, 0.3641, -0.8791, ..., -0.2845, 0.4257, -0.3141],
[ 0.7677, 0.1613, -1.0163, ..., -0.4973, 0.6095, 0.1333],
[ 0.7613, 0.2032, 0.1106, ..., -0.3577, 1.1559, -0.1607]])
'''
A small trick to extract from the last 4 layers of the model involves using the model.layers
attribute.
last_four = list(range(model.layers+1))[-4:]
model.extract_representation(instances, layer = last_four)
'''
[tensor([[ 0.9946, 0.9216, -0.2156, ..., -0.4210, 0.3931, 0.2772],
[ 1.3800, 0.7844, -0.3833, ..., -0.2114, 0.5890, -0.4240],
[ 0.2445, 0.2528, 0.3336, ..., -0.4638, 0.3499, -0.2713],
...,
[ 0.6907, 0.4878, -0.2212, ..., 0.0482, 0.0307, 0.0583],
[ 0.6474, 0.9128, -0.1065, ..., -0.6336, -0.1677, -0.2797],
[ 1.0601, 0.2845, 0.2318, ..., -0.7379, 0.6266, -0.3281]]),
tensor([[ 1.0311, 0.2918, 0.0645, ..., -1.2735, 0.6836, -0.3382],
[ 1.3628, 0.4729, -0.1582, ..., -0.9663, 0.5652, -0.9539],
[ 0.3337, -0.0796, 0.3472, ..., -0.8018, 0.7095, -0.5963],
...,
[ 0.8667, 0.5114, -0.4021, ..., -0.2998, 0.3388, -0.5384],
[ 0.5956, 0.2355, -0.1907, ..., -0.3381, 0.0460, -0.2902],
[ 1.1451, 0.1151, 0.1154, ..., -0.5127, 1.3450, -0.5618]]),
tensor([[ 0.9413, 0.3149, 0.0279, ..., -1.2449, 0.5606, -0.0098],
[ 1.2197, 0.4842, 0.0792, ..., -1.3511, 0.3262, -0.8011],
[ 0.1097, -0.0434, 0.4395, ..., -1.2032, 1.0695, 0.1302],
...,
[ 0.5841, 0.3509, -0.4591, ..., -0.2502, 0.5510, -0.2269],
[ 0.8643, 0.2670, -0.8616, ..., -0.4080, 0.3723, -0.0150],
[ 0.9704, 0.2731, 0.0032, ..., -0.4349, 1.3013, -0.1415]]),
tensor([[ 0.4418, 0.2876, -0.4887, ..., -0.9296, 0.2270, 0.4052],
[ 0.6424, 0.3509, 0.1863, ..., -0.7924, 0.0086, -0.5835],
[-0.1713, -0.0228, -0.1329, ..., -0.8026, 0.6910, 0.0614],
...,
[ 0.1711, 0.3773, -1.2992, ..., -0.3187, 0.3004, -0.4013],
[ 0.6711, 0.0556, -1.1710, ..., -0.5866, 0.8467, 0.2816],
[ 0.5522, 0.1332, 0.2180, ..., -0.2806, 1.0105, -0.1799]])]
'''
One can even pass constituent phrases of a sentence in each instance to extract representations of phrases (by default they are pooled using an average). For instance:
phrases = [
['I like reading books.', 'reading books'],
['I also like riding my bike!', 'riding my bike']
]
model.extract_representation(phrases)
'''
tensor([[ 0.2985, 0.6786, 0.2250, ..., 0.1723, 0.3650, -0.5355],
[ 1.2477, 0.1224, -0.0942, ..., -0.0835, -0.2701, -0.2143]])
'''
Fin.