Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT: Adding Master Key Jailbreak #248

Merged
merged 7 commits into from
Jun 18, 2024

Conversation

SafwanA02
Copy link
Contributor

Description

Opening this after accidently closing #235 due to git errors.

  • implemented batching to improve efficiency in "send_master_key_with_prompts_async" function in master_key_orchestrator
  • added "print_conversation" function to master_key_orchestrator
  • addressed comments from previous PR

Tests and Documentation

  • added test to make sure that conversation id's are the same in corresponding prompts

@SafwanA02
Copy link
Contributor Author

10. ie

@microsoft-github-policy-service agree company="Microsoft"

@SafwanA02 SafwanA02 changed the title FEAT: Updating Master Key Jailbreak FEAT: Adding Master Key Jailbreak Jun 18, 2024
Comment on lines +99 to +105
await self._prompt_normalizer.send_prompt_async(
normalizer_request=NormalizerRequest([target_master_prompt_obj]),
target=self._prompt_target,
conversation_id=conversation_id,
labels=self._global_memory_labels,
orchestrator_identifier=self.get_identifier(),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will probably work most of the time, but I wonder if we might need some error handling in case things don't go as planned. Have we seen any issues during testing that we might want to handle here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I havn't run into any issues with this part of the code yet, but I'll keep an eye out for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Targets handle retries if that's what you mean @dlmgary

)
)

batch_results = await asyncio.gather(*tasks)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PromptNormalizer handles all of this and does the asyncio.gather() for you. Is there a a reason why we're not using that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't support multi-turn conversations at this point. It may be possible to extend it by plumbing through the conversation ID but there will be more work on this orchestrator anyway and I don't want to hold it up further.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prompt normalizer just sends the prompts directly using the send_prompt_async function, but here I'm using the send_master_key_prompt_async function instead becuase I need it to the master key prompt first, and then follow it up with the attack prompt.

Comment on lines 154 to 156
def _chunked_prompts(self, prompts, size):
for i in range(0, len(prompts), size):
yield prompts[i : i + size]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Add type hints.
  • Is there a specific reason why we're using generators here? I'm a fan but it seems a bit of an anti-pattern here.
  • This function might not be needed at all, consider removing it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copied from prompt normalizer batching code. I suppose it makes it a tad cleaner than having this indexing logic mixed in with the rest of the code. I don't really care either way tbh.

@romanlutz romanlutz merged commit 82df0be into Azure:main Jun 18, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants