Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tree of Attacks #446

Merged
merged 14 commits into from
Feb 20, 2024
Merged

Tree of Attacks #446

merged 14 commits into from
Feb 20, 2024

Conversation

erickgalinkin
Copy link
Collaborator

Initial commit containing full, untested implementation for tree of attacks.

Updated requirements.txt to include numpy.

…ttacks.

Updated requirements.txt to include numpy.
@erickgalinkin
Copy link
Collaborator Author

Need to test the code, run generation, and write probes.

… ast.literal_eval with regular expression. Add small handful of jailbreaks successful against gpt-3.5-turbo and gpt-4.
@leondz
Copy link
Owner

leondz commented Feb 2, 2024

replaced pathlib call w/. garak._config.basedir

also - is there an interfaces to running full TAP locally via the attack manager? it looks like TAPProbe uses a cached set, which is great for getting off the gorund, but it'd be good to include a probe that runs full TAP

@erickgalinkin
Copy link
Collaborator Author

replaced pathlib call w/. garak._config.basedir

also - is there an interfaces to running full TAP locally via the attack manager? it looks like TAPProbe uses a cached set, which is great for getting off the gorund, but it'd be good to include a probe that runs full TAP

I'll update the probe with a cached parameter! Good thinking.

… as a parameter in run_tap function. Rename current TAPProbe to TapCachedProbe. Add TAPProbe and PAIRProbe.
Copy link
Owner

@leondz leondz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

the config route doesn't offer enough bandwidth for this plugin & needs revision, let's track that in another issue/pr

@leondz leondz merged commit b2a293a into leondz:main Feb 20, 2024
1 check passed
@leondz leondz deleted the tree-of-attacks branch February 20, 2024 13:40
@github-actions github-actions bot locked and limited conversation to collaborators Feb 20, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

probe: add PAIR from "Jailbreaking Black Box Large Language Models in Twenty Queries 🌶️"
2 participants