Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ngram-based blacklist filter #54

Merged
merged 52 commits into from
Nov 9, 2023
Merged

Conversation

steffencruz
Copy link
Contributor

@steffencruz steffencruz commented Oct 30, 2023

Introduces a ngram-based blacklist. The blacklist is based on an n-gram counter with a sliding window (deque). Conceptually, when completions contain similar subsequences the associated n-grams accumulate counts and significance. When the significance exceeds a threshold (Blacklist.boundary), all completions containing that n-gram are flagged and recieve no reward. The sliding window allows the blacklist to adapt to the changing network.

Runtime

Estimated runtime for adding to the counter is 25ms per step (50 completions).
Estimated runtime for rewarding completions is unknown right now but it requires a full pass over a dict of max_size (~1M) items

Hyperparameters

There is a tradeoff between speed and memory consumption. I think 1M is okay for the queue size but 100k works too.
The length of the queue and throughput of the network (words per unit time) determine how rapidly the blacklist learns/forgets n-grams. I estimate that one step contains around 3400 completion words so a queue of length 1M should be completely flushed in 300 steps. This means that there will be a record of an n-gram for at most 300 steps or ~2.5 hours (@30s per step).
n_min = 5 and n_max = 15 seem reasonable to me. If you make n_min too large then exploits will contain small phrases.

Notes

Long n-grams end up creating multiple entries in the blacklist as their subsequences are also necessarily frequent too. Eg. ('this','is','an','example','long','sentence') will also guarantee the presence of the n-gram ('an','example','long','sentence') (and all the other subsequences) with at least as many counts. Because of this, a common 14-gram also has a common 13-gram, 12-gram ... which tend to fill of the queue quite rapidly.

TODO

  • Tracked experiments to determine optimal blacklist parameters (would be good to include raw significance scores in those runs rather than only the binary output)
  • Pass Blacklist parameters via config
  • Unit tests

@steffencruz
Copy link
Contributor Author

Come to think of it we can extend the effective time window by reducing the ngram addition rate via stochastic sampling. Instead of adding add ngrams in a completion we can simply select a subset.

Right now around 12,000 n-grams are added per step (from 50 completions combined). We can likely reduce this by a factor of 5-10x by randomly selecting n-grams.

@steffencruz
Copy link
Contributor Author

steffencruz commented Oct 31, 2023

import tqdm
import wandb
import numpy as np
import pandas as pd
import plotly.express as px
from prompting.validators.reward.blacklist import Blacklist

api = wandb.Api()

# reproducibility
run = api.run('opentensor-dev/openvalidators/7rahrixe')

df = pd.DataFrame(run.history(samples=1000))

ngram_manager = Blacklist()

# get the batches
batches = df.loc[df.completions.notna(), 'completions']

# inject some test phrases into the completions to see how they would be detected by the blacklist
test_phrases = [
    {'phrase':'that is an excellent question', 'begin':0, 'end':200, 'probability':0.25},
    {'phrase':'Sure! I\'d be happy to help you with that enquiry', 'begin':200, 'end':300, 'probability':0.15},
    {'phrase':'Hell yeah! I\'m ready to do this right now!', 'begin':500, 'end':700, 'probability':0.05},
]
n_test_phrases = 0

save_every = 20
snapshots = []
for i, completions in enumerate(tqdm.tqdm(batches)):
    comps = []
    for completion in completions:
        c = completion
        for phrase in test_phrases:
            if (phrase.get('begin',0) <= i <= phrase.get('end', len(batches))) and phrase.get('probability',1)>np.random.rand():
                c += ' ' + phrase['phrase']
                n_test_phrases += 1
                break
        comps.append(c)
    ngram_manager.add(comps)

    if i%save_every == 0:
        snapshots.append({'step':i, 'deque_length':len(ngram_manager.deque), 'running_size':ngram_manager._running_size, 'significance':ngram_manager.most_significant(10), 'counts':ngram_manager.most_common(10)})

df_snapshots = pd.DataFrame(snapshots)

def make_top_ngrams(x, ntop=10):
    return pd.DataFrame( [{'rank': i, 'phrase':' '.join(xx[0]), 'significance':xx[1]} for i, xx in enumerate(x[:ntop], start=1)])

traces = []
ntop = 10
for idx, row in df_snapshots.iterrows():
    traces.append(make_top_ngrams(row.significance, ntop=ntop).assign(step=row.step))#, date=row.Date, name=row.name, block=row.block))

df_traces = pd.concat(traces)
colors = [px.colors.find_intermediate_color('rgb(0,0,250)', 'rgb(250,0,0)', (ntop-i)/ntop, colortype='rgb') for i in range(0, ntop+1)]
fig = px.line(df_traces, x='step', y='significance', color='rank', #markers=True,
        hover_name='phrase', #hover_data=['date', 'name', 'block'],
        color_discrete_sequence=colors,
        width=800, height=600, template='plotly_white'
        )
fig.update_traces(opacity=0.5)

which produces the following trace data
image
image
image
which is well-detected by the blacklist and justifies a boundary of ~1000, given the default parameters.

The same trace data without the injected phrases looks like this
image


# Check if any n-grams have significance above the boundary
for ngram in ngrams:
if scores.get(ngram) > self.boundary:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be of interest to include a self.blacklist attribute which contains all the ngrams with significance>self.boundary. It should not be a very large object (10s-100s items). This is something we could log to wandb regularly to introspect the blacklist.

@steffencruz steffencruz mentioned this pull request Nov 1, 2023
Copy link
Contributor

@p-ferreira p-ferreira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things to be done:

Copy link
Contributor

@p-ferreira p-ferreira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I would be ok to merge it once the branch conflicts have been fixed

@p-ferreira p-ferreira marked this pull request as ready for review November 8, 2023 22:16
Copy link
Contributor Author

@steffencruz steffencruz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments, then it's good to go

prompting/validators/reward/blacklist.py Show resolved Hide resolved
prompting/validators/reward/blacklist.py Outdated Show resolved Hide resolved
prompting/validators/reward/blacklist.py Outdated Show resolved Hide resolved
prompting/validators/reward/blacklist.py Outdated Show resolved Hide resolved
prompting/validators/reward/blacklist.py Outdated Show resolved Hide resolved
prompting/validators/reward/blacklist.py Outdated Show resolved Hide resolved
prompting/validators/reward/blacklist.py Outdated Show resolved Hide resolved
prompting/validators/reward/blacklist.py Outdated Show resolved Hide resolved
prompting/validators/reward/blacklist.py Outdated Show resolved Hide resolved
isabella618033 and others added 18 commits November 8, 2023 17:45
Co-authored-by: Steffen Cruz <steffenjcruz@gmail.com>
Co-authored-by: Steffen Cruz <steffenjcruz@gmail.com>
Co-authored-by: Steffen Cruz <steffenjcruz@gmail.com>
Co-authored-by: Steffen Cruz <steffenjcruz@gmail.com>
Co-authored-by: Steffen Cruz <steffenjcruz@gmail.com>
Co-authored-by: Steffen Cruz <steffenjcruz@gmail.com>
Co-authored-by: Steffen Cruz <steffenjcruz@gmail.com>
Co-authored-by: Steffen Cruz <steffenjcruz@gmail.com>
Co-authored-by: Steffen Cruz <steffenjcruz@gmail.com>
Co-authored-by: Steffen Cruz <steffenjcruz@gmail.com>
Co-authored-by: Steffen Cruz <steffenjcruz@gmail.com>
Co-authored-by: Steffen Cruz <steffenjcruz@gmail.com>
@p-ferreira p-ferreira mentioned this pull request Nov 9, 2023
@p-ferreira p-ferreira merged commit 0da7803 into staging Nov 9, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants