Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add red teaming orchestrators to replace RedTeamingBot #84

Merged
merged 23 commits into from
Mar 14, 2024

Conversation

romanlutz
Copy link
Contributor

@romanlutz romanlutz commented Mar 5, 2024

Description

This PR adds the BaseRedTeamingOrchestrator abstract class as well as two implementation classes ScoringRedTeamingOrchestrator and EndTokenRedTeamingOrchestrator. With these changes, a few others were necessary and are therefore part of this PR:

  • Introduced the AttackStrategy that encapsulates the functionality previously spread over 3 input args of RedTeamingBot (attack_strategy, attack_strategy_kwargs, and conversation_objective) and allows for just string inputs as strategies, too.
  • Every reference to RedTeamingBot needed to be updated and all the logic replaced. This includes several notebooks and documentation files.
  • Added GandalfTarget which does not support async.

What is not included here?

  • Detection of a second conversation (an orchestrator should only be used for 1 right now)
  • support for 1-to-many converters (we currently raise an exception in such a case)

Tests

  • no new tests required
  • new tests added
  • existing tests adjusted

Documentation

  • no documentation changes needed
  • documentation added or edited
  • example notebook added or updated

Copy link
Contributor

@rlundeen2 rlundeen2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @romanlutz! Love the architecture and separation. Really good design

Only thing missing is tests, but everything looks good to me

@romanlutz
Copy link
Contributor Author

romanlutz commented Mar 9, 2024

Great work @romanlutz! Love the architecture and separation. Really good design

Only thing missing is tests, but everything looks good to me

Thanks! I've been looking into tests the last couple of days. There are a few things that we still need to settle on, including

  • The memory now includes two conversations (the one with RT bot and the one with the target) with separate conversation IDs. When we run get_memory() on the orchestrator, what do we actually expect in return? Just the conversation with the target? Or both?
  • If you provide converters that add more prompts, then we suddenly have multiple conversations to manage. What is the expected outcome? For example, the red teaming orchestrator starts with 1 prompt, the converter makes it 2 (the original and a rewritten one). Now we have two conversations with the target. Should it continue both conversations? If so, then on the next go-around we have 4. This could potentially explode if there are too many turns. I'm leaning towards adding a validation of input converters. It would run a prompt through the converters and raise an exception if they generate more than 1. Anyway, we need to talk about this one...
  • Do we expect people to reuse their orchestrator objects? If so, then the conversation ID will not get reset (at least in the current implementation), meaning that we can't rely on the memory to tell us if we're at the start of a conversation (as it contains prior ones). This is important for deciding what to send as the initial prompt, but also more generally.

@rlundeen2
Copy link
Contributor

Great work @romanlutz! Love the architecture and separation. Really good design
Only thing missing is tests, but everything looks good to me

Thanks! I've been looking into tests the last couple of days. There are a few things that we still need to settle on, including

  • The memory now includes two conversations (the one with RT bot and the one with the target) with separate conversation IDs. When we run get_memory() on the orchestrator, what do we actually expect in return? Just the conversation with the target? Or both?

I would say get_memory() is not a required function, so depends on the orchestrator. But to me it makes the most sense to just return the conversation with the target?

  • If you provide converters that add more prompts, then we suddenly have multiple conversations to manage. What is the expected outcome? For example, the red teaming orchestrator starts with 1 prompt, the converter makes it 2 (the original and a rewritten one). Now we have two conversations with the target. Should it continue both conversations? If so, then on the next go-around we have 4. This could potentially explode if there are too many turns. I'm leaning towards adding a validation of input converters. It would run a prompt through the converters and raise an exception if they generate more than 1. Anyway, we need to talk about this one...

Good point. We may want to add a constraint on certain types of converters so they can only do one. But let's design it.

  • Do we expect people to reuse their orchestrator objects? If so, then the conversation ID will not get reset (at least in the current implementation), meaning that we can't rely on the memory to tell us if we're at the start of a conversation (as it contains prior ones). This is important for deciding what to send as the initial prompt, but also more generally.

I think this is okay. If we run across use cases where folks want to reuse it, we can add that functionality, but I'd expect it to be manual e.g. reset_target_conversation or something

@romanlutz romanlutz changed the title [WIP] Add RedTeamingOrchestrator to replace RedTeamingBot Add red teaming orchestrators to replace RedTeamingBot Mar 12, 2024
pyrit/orchestrator/base_red_teaming_orchestrator.py Outdated Show resolved Hide resolved
pyrit/orchestrator/base_red_teaming_orchestrator.py Outdated Show resolved Hide resolved
pyrit/orchestrator/base_red_teaming_orchestrator.py Outdated Show resolved Hide resolved
pyrit/prompt_converter/ascii_art_converter.py Show resolved Hide resolved
@romanlutz romanlutz dismissed dlmgary’s stale review March 14, 2024 04:48

Will need to address these at a later point and will follow up with Gary. Would prefer to unblock the team by merging this tonight if possible.

@romanlutz romanlutz merged commit d691b9c into Azure:main Mar 14, 2024
4 checks passed
@romanlutz romanlutz deleted the romanlutz/redteamingbot_orchestrator branch March 14, 2024 05:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants