Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

probe: add PAIR from "Jailbreaking Black Box Large Language Models in Twenty Queries 🌶️" #316

Closed
leondz opened this issue Nov 14, 2023 · 1 comment · Fixed by #446
Closed
Assignees
Labels
new plugin Describes an entirely new probe, detector, generator or harness probes Content & activity of LLM probes

Comments

@leondz
Copy link
Owner

leondz commented Nov 14, 2023

website: https://jailbreaking-llms.github.io ⭐️
paper: https://arxiv.org/abs/2310.08419
code: https://github.com/patrickrchao/JailbreakingLLMs

@leondz leondz added probes Content & activity of LLM probes new plugin Describes an entirely new probe, detector, generator or harness labels Nov 14, 2023
@leondz
Copy link
Owner Author

leondz commented Nov 14, 2023

what targets? consider:
a. groups of goals from llm-attacks
b. groups of risks in language model risk cards

@leondz leondz changed the title add "Jailbreaking Black Box Large Language Models in Twenty Queries 🌶️" probe: add PAIR from "Jailbreaking Black Box Large Language Models in Twenty Queries 🌶️" Nov 15, 2023
@erickgalinkin erickgalinkin linked a pull request Feb 14, 2024 that will close this issue
@erickgalinkin erickgalinkin self-assigned this Feb 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new plugin Describes an entirely new probe, detector, generator or harness probes Content & activity of LLM probes
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants