FEAT Prompt Shield #271

ValbuenaVC · 2024-07-02T23:18:35Z

Description

Added:

PromptShieldTarget: Jailbreak/attack detector using the Prompt Shield Content Safety Resource.
PromptShieldScorer: A true/false scorer returning True when an attack is detected.
Change to net_utility in pyrit/common to accept HTTPS params as an argument in make_request_and_raise_if_error().

Tests

PromptShieldTarget: Completed
PromptShieldScorer: Completed

TODO

Jupytext file code
Fix api version hardcoding
Finish tests
Add keys to keyvault

ValbuenaVC · 2024-07-02T23:19:27Z

@microsoft-github-policy-service agree company="Microsoft"

…and scorer so they can be imported.

pyrit/common/net_utility.py

pyrit/prompt_target/prompt_shield_target.py

… two of the fields (userPrompt, documents) of the Prompt Shield endpoint. Expanding on the docs

…re the source is an ArXiv paper.

blakebullwinkel · 2024-07-08T21:37:26Z

Great work @ValbuenaVC ! I think you can remove the "DRAFT" tag from the title as this PR is actively being reviewed.

pyrit/score/prompt_shield_scorer.py

pyrit/prompt_target/prompt_shield_target.py

doc/code/targets/prompt_shield_target.ipynb

doc/code/scoring/6_prompt_shield_scorer.ipynb

pyrit/common/net_utility.py

pyrit/prompt_target/prompt_shield_target.py

pyrit/score/prompt_shield_scorer.py

Format imports to standard Co-authored-by: Raja Sekhar Rao Dheekonda <43563047+rdheekonda@users.noreply.github.com>

…ith the rest of PyRIT.

…naVC/PyRIT into t-vivalbuena/PromptShield Merging to combine suggested changes with local changes.

…ress.

…naVC/PyRIT into t-vivalbuena/PromptShield Adding jupytext-generated .py files for the Prompt Shield tutorials.

rlundeen2 · 2024-08-06T16:00:31Z

doc/code/scoring/6_prompt_shield_scorer.ipynb

+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Error during table creation: (duckdb.duckdb.IOException) IO Error: File is already open in \n",


I recommend re-running without error

(this can happen if you have your db open e.g. in dbeaver)

rlundeen2 · 2024-08-06T16:03:07Z

doc/code/scoring/6_prompt_shield_scorer.py

+load_default_env()
+
+# +
+#NOTE: This is throwing an IOError, but I'm not sure why?


clean up comment

rlundeen2 · 2024-08-06T16:05:05Z

doc/code/scoring/6_prompt_shield_scorer.py

+)
+
+
+# with ScoringOrchestrator(


Fix this up :)

rlundeen2 · 2024-08-06T16:11:33Z

pyrit/datasets/prompt_templates/GetDatesFromPromptTemplates.ipynb

+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},


Can we remove this file?

rlundeen2 · 2024-08-06T16:15:30Z

pyrit/orchestrator/prompt_sending_orchestrator.py

@@ -30,6 +30,7 @@ def __init__(
        self,
        prompt_target: PromptTarget,
        prompt_converters: Optional[list[PromptConverter]] = None,
+        normalizer: Optional[PromptNormalizer] = None, #NOTE Added


Do we need this new parameter? Can we get rid of it?

If we wanted to inject this for tests, I could see it being useful, but as far as I can see it's not currently used?

rlundeen2 · 2024-08-06T16:16:34Z

pyrit/orchestrator/prompt_sending_orchestrator.py

    ) -> list[PromptRequestResponse]:
        """
        Sends the prompts to the prompt target.
        """

+        # This happens often when trying to send one prompt, but forgetting to wrap it in brackets.
+        if not isinstance(prompt_list, list):


In the case a user passes a string, should we treat it like a list of one?

rlundeen2 · 2024-08-06T16:20:36Z

pyrit/score/prompt_shield_scorer.py

+        self._memory = memory if memory else DuckDBMemory()
+        self._target: PromptShieldTarget = target
+
+    async def score_async(self, request_response: PromptRequestPiece | PromptMemoryEntry) -> list[Score]:


we updated this interface to add one extra field; but it can throw an error if passed

rlundeen2 · 2024-08-06T16:23:19Z

pyrit/score/prompt_shield_scorer.py

+
+    def __init__(
+            self,
+            target: PromptShieldTarget,


nit: can we rename to prompt_shield_target so we can keep in mind it's specific?

rlundeen2 · 2024-08-06T16:25:21Z

pyrit/score/prompt_shield_scorer.py

+            self, 
+            request_response: Any
+        ) -> None:
+        if not isinstance(request_response, PromptRequestPiece) and not isinstance(request_response, PromptMemoryEntry):


we prob don't need to do type checking, but it may be nice to only check text due to the target only supporting text

(mypy should take care of type checking more or less)

rlundeen2 · 2024-08-06T16:27:16Z

pyrit/score/prompt_shield_scorer.py

+        ) -> None:
+
+        self.scorer_type = "true_false"
+        self._conversation_id = str(uuid.uuid4())


this should be created in score_async so previous conversations aren't sent

pyrit/score/prompt_shield_scorer.py

rlundeen2 · 2024-08-06T16:30:35Z

tests/score/test_prompt_shield_scorer.py

+def sample_conversations() -> list[PromptRequestPiece]:
+    return get_sample_conversations()
+
+@pytest.fixture


Can you check code coverage percentages and comment here on them?

ValbuenaVC added 4 commits July 2, 2024 15:58

Net_utility now accepts a 'params' argument.

beaa497

Added Prompt Shield target as a PromptTarget.

57dd8c6

Added Prompt Shield scorer.

ab055e5

Cleaned up Prompt Shield Target, Scorer.

8497d76

ValbuenaVC added 2 commits July 3, 2024 11:14

Added a tutorial doc and added features to the __init__.py of target …

9dc03b1

…and scorer so they can be imported.

The tutorial Jupyter notebook

5eedcaa

rlundeen2 reviewed Jul 3, 2024

View reviewed changes

pyrit/common/net_utility.py Outdated Show resolved Hide resolved

rlundeen2 reviewed Jul 3, 2024

View reviewed changes

pyrit/prompt_target/prompt_shield_target.py Outdated Show resolved Hide resolved

rlundeen2 reviewed Jul 3, 2024

View reviewed changes

pyrit/prompt_target/prompt_shield_target.py Show resolved Hide resolved

rlundeen2 reviewed Jul 3, 2024

View reviewed changes

pyrit/prompt_target/prompt_shield_target.py Outdated Show resolved Hide resolved

rlundeen2 reviewed Jul 3, 2024

View reviewed changes

pyrit/prompt_target/prompt_shield_target.py Show resolved Hide resolved

rlundeen2 reviewed Jul 3, 2024

View reviewed changes

pyrit/prompt_target/prompt_shield_target.py Show resolved Hide resolved

ValbuenaVC added 4 commits July 3, 2024 14:18

Added parsing to Prompt Shield to handle requests that may use one or…

5e76611

… two of the fields (userPrompt, documents) of the Prompt Shield endpoint. Expanding on the docs

Expanding on the docs and parsing logic.

f154b44

Implemented scorer changes, but still noticing some bugs.

c616aec

Added a small script to help retrieve dates from prompt templates whe…

9f1aaea

…re the source is an ArXiv paper.