probe: add PAIR from "Jailbreaking Black Box Large Language Models in Twenty Queries 🌶️" #316

leondz · 2023-11-14T13:01:45Z

website: https://jailbreaking-llms.github.io ⭐️
paper: https://arxiv.org/abs/2310.08419
code: https://github.com/patrickrchao/JailbreakingLLMs

leondz · 2023-11-14T19:49:14Z

what targets? consider:
a. groups of goals from llm-attacks
b. groups of risks in language model risk cards

leondz added probes Content & activity of LLM probes new plugin Describes an entirely new probe, detector, generator or harness labels Nov 14, 2023

leondz changed the title ~~add "Jailbreaking Black Box Large Language Models in Twenty Queries 🌶️"~~ probe: add PAIR from "Jailbreaking Black Box Large Language Models in Twenty Queries 🌶️" Nov 15, 2023

erickgalinkin linked a pull request Feb 14, 2024 that will close this issue

Tree of Attacks #446

Merged

erickgalinkin self-assigned this Feb 14, 2024

leondz closed this as completed in #446 Feb 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

probe: add PAIR from "Jailbreaking Black Box Large Language Models in Twenty Queries 🌶️" #316

probe: add PAIR from "Jailbreaking Black Box Large Language Models in Twenty Queries 🌶️" #316

leondz commented Nov 14, 2023

leondz commented Nov 14, 2023

probe: add PAIR from "Jailbreaking Black Box Large Language Models in Twenty Queries 🌶️" #316

probe: add PAIR from "Jailbreaking Black Box Large Language Models in Twenty Queries 🌶️" #316

Comments

leondz commented Nov 14, 2023

leondz commented Nov 14, 2023