Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT Prompt Shield #271

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

ValbuenaVC
Copy link

@ValbuenaVC ValbuenaVC commented Jul 2, 2024

Description

Added:

  • PromptShieldTarget: Jailbreak/attack detector using the Prompt Shield Content Safety Resource.
  • PromptShieldScorer: A true/false scorer returning True when an attack is detected.
  • Change to net_utility in pyrit/common to accept HTTPS params as an argument in make_request_and_raise_if_error().

Tests

  • PromptShieldTarget: Completed
  • PromptShieldScorer: Completed

TODO

  • Jupytext file code
  • Fix api version hardcoding
  • Finish tests
  • Add keys to keyvault

@ValbuenaVC
Copy link
Author

@microsoft-github-policy-service agree company="Microsoft"

@blakebullwinkel
Copy link
Contributor

Great work @ValbuenaVC ! I think you can remove the "DRAFT" tag from the title as this PR is actively being reviewed.

ValbuenaVC and others added 2 commits July 9, 2024 13:36
Format imports to standard

Co-authored-by: Raja Sekhar Rao Dheekonda <43563047+rdheekonda@users.noreply.github.com>
@ValbuenaVC ValbuenaVC changed the title FEAT [DRAFT] Prompt Shield Features FEAT Prompt Shield Jul 9, 2024
"name": "stderr",
"output_type": "stream",
"text": [
"Error during table creation: (duckdb.duckdb.IOException) IO Error: File is already open in \n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend re-running without error

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this can happen if you have your db open e.g. in dbeaver)

load_default_env()

# +
#NOTE: This is throwing an IOError, but I'm not sure why?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean up comment

)


# with ScoringOrchestrator(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix this up :)

{
"cell_type": "code",
"execution_count": null,
"metadata": {},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this file?

@@ -30,6 +30,7 @@ def __init__(
self,
prompt_target: PromptTarget,
prompt_converters: Optional[list[PromptConverter]] = None,
normalizer: Optional[PromptNormalizer] = None, #NOTE Added
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this new parameter? Can we get rid of it?

If we wanted to inject this for tests, I could see it being useful, but as far as I can see it's not currently used?

) -> list[PromptRequestResponse]:
"""
Sends the prompts to the prompt target.
"""

# This happens often when trying to send one prompt, but forgetting to wrap it in brackets.
if not isinstance(prompt_list, list):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case a user passes a string, should we treat it like a list of one?

self._memory = memory if memory else DuckDBMemory()
self._target: PromptShieldTarget = target

async def score_async(self, request_response: PromptRequestPiece | PromptMemoryEntry) -> list[Score]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we updated this interface to add one extra field; but it can throw an error if passed


def __init__(
self,
target: PromptShieldTarget,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we rename to prompt_shield_target so we can keep in mind it's specific?

self,
request_response: Any
) -> None:
if not isinstance(request_response, PromptRequestPiece) and not isinstance(request_response, PromptMemoryEntry):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we prob don't need to do type checking, but it may be nice to only check text due to the target only supporting text

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(mypy should take care of type checking more or less)

) -> None:

self.scorer_type = "true_false"
self._conversation_id = str(uuid.uuid4())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be created in score_async so previous conversations aren't sent

def sample_conversations() -> list[PromptRequestPiece]:
return get_sample_conversations()

@pytest.fixture
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check code coverage percentages and comment here on them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants