FEAT: Adding crescendo #151

AhmedSalem2 · 2024-04-19T16:48:08Z

Description

Adding the code for Crescendomation to PyRIT

Tests and Documentation

romanlutz · 2024-04-19T16:58:59Z

pyrit/prompt_target/crescendo_target.py

+
+
+    # Adapting the chat history to the Llama-2 format
+    def langhchain_to_llama2(inputObject,Llama2Tokenizer):


Suggested change

def langhchain_to_llama2(inputObject,Llama2Tokenizer):

def langchain_to_llama2(inputObject,Llama2Tokenizer):

romanlutz · 2024-04-19T16:59:49Z

pyrit/orchestrator/crescendo_orchestrator.py

+# from langchain_openai import AzureChatOpenAI
+from langchain.schema import AIMessage
+from langchain.schema import HumanMessage
+# from transformers import AutoModelForCausalLM


please remove unused imports
Relatedly, you may want to run pre-commit run --all-files if you haven't yet

romanlutz · 2024-04-19T17:00:27Z

pyrit/prompt_target/crescendo_target.py

+
+
+class CrescendoLLM():
+


This needs a docstring.

romanlutz · 2024-04-19T17:01:05Z

pyrit/prompt_target/crescendo_target.py

+
+        return False,'Error: Incorrect JSON format'
+
+    # Adapting the chat history to the Gimini format


Suggested change

# Adapting the chat history to the Gimini format

# Adapting the chat history to the Gemini format

romanlutz · 2024-04-19T17:02:01Z

pyrit/prompt_target/crescendo_target.py

+        return history
+
+    # Calling the LLM. Adapting the input format depending on the LLM
+    def callLLM(self, inputObject):


Suggested change

def callLLM(self, inputObject):

def call_llm(self, inputObject):

nit: method names should be snake case

romanlutz · 2024-04-19T17:23:02Z

pyrit/orchestrator/crescendo_orchestrator.py

+        counter = 0
+        counter2 = 0


something a bit more descriptive would help 🙂

romanlutz · 2024-04-19T17:23:29Z

pyrit/orchestrator/crescendo_orchestrator.py

+
+            # Setting the question/prompt as the next message to the target model
+            history.append(HumanMessage(content=questionToAsk))
+            print("Prompt: %s"%questionToAsk)


Please use f-strings here and elsewhere

romanlutz · 2024-04-19T17:23:49Z

pyrit/orchestrator/crescendo_orchestrator.py

+            # Checking if the target model refused to respond
+            if flag:
+                # Checking if the target model refused to respond due to ethical/legal reasons using a refuse judge
+                # Refuse judge is a the Judge with a different prompt meta prompt that checks if the response of the target model is a rejection for an ethical/legal reason


Suggested change

# Refuse judge is a the Judge with a different prompt meta prompt that checks if the response of the target model is a rejection for an ethical/legal reason

# Refuse judge is a the Judge with a different meta prompt that checks if the response of the target model is a rejection for an ethical/legal reason

romanlutz · 2024-04-19T17:24:21Z

pyrit/orchestrator/crescendo_orchestrator.py

+                refuseJudgeResults = self._judge_llm.evaluateInputGPT4(response.content,questionToAsk,mode='refuseJudge')
+                refuseJudgeFlag = refuseJudgeResults['resultFlag']
+
+                flag = not (refuseJudgeFlag == 'true' or refuseJudgeFlag == 'True' or refuseJudgeFlag == True)


Should this check be inside the judge code and just return a bool?

romanlutz · 2024-04-19T17:25:42Z

pyrit/orchestrator/crescendo_orchestrator.py

+
+
+# Generating the next question/prompt of Crescendo
+def getNewQuestion(attackerLLM,metaPromptFile,lastResponse,jailbreakTarget,questionsWithSummary=None,verbose=False,round=None,resultFlag=None):


All the methods need docstrings. Also, do you want this to be exposed and have users call it? Or should they only use auto_crescendo? Assuming it's the latter I'd suggest starting the names with underscores, so e.g., _get_new_question

rlundeen2 · 2024-04-19T19:54:08Z

pyrit/orchestrator/crescendo_orchestrator.py

+        lastResponse = ''
+        pathToSave = './%s'%folderName
+        # The history of the conversation for the target model
+        history = [] 


Conversation history would work better with memory than saving to files

you would get the conversation history using the memory interface

rlundeen2 · 2024-04-19T19:55:13Z

pyrit/orchestrator/crescendo_orchestrator.py

+        judge_llm:  CrescendoJudge,
+        metaPromptsPath: str,
+        crescendo_variant: int = 1,
+        max_rounds: int = 10,


Should also have memory passed in and initialized

rlundeen2 · 2024-04-19T19:57:22Z

pyrit/orchestrator/crescendo_orchestrator.py

+from pyrit.orchestrator import Orchestrator
+from pyrit.prompt_target import CrescendoLLM
+from pyrit.score import CrescendoJudge
+import os


This (and target) should have unit tests for all public methods

And part of that is probably refactoring. Right now this is tough to follow; try to break it up into small understandable functions that can be tested for correctness.

rlundeen2 · 2024-04-19T19:59:31Z

pyrit/orchestrator/crescendo_orchestrator.py

+                print(f'Response: {lastResponse}')
+                if usePerspectiveAPI:
+                    # Evaluate the response using the Perspective API
+                    answerScore, toxicityCategory,allScores = self._judge_llm.evaluateInput(lastResponse)


a lot of this code I think should be abstracted to the scorer/judge

Scorer/Judge code is responsible for evaluating responses from a target. The orchestrator's job is to decide what to do with that evaluation.

rlundeen2 · 2024-04-19T20:03:09Z

pyrit/prompt_target/crescendo_target.py

+from transformers import AutoModelForCausalLM
+from vllm import LLM, SamplingParams
+from transformers import LlamaForCausalLM, AutoTokenizer
+import time


Roman had a similar comment, but the target should mostly just be how to communicate with the endpoint. For example, you probably don't need a separate target for crescendo. But you would need an "Anthropic" target (which we don't have yet) to do crescendo against Anthropic

This also doesn't implement the required interface for targets, so can't plug into the rest of the framework as is

dlmgary · 2024-05-01T15:04:43Z

pyrit/orchestrator/crescendo_orchestrator.py

+                azureCFHate_severity=0
+                azureCFSelfHarm_severity=0
+                azureCFSexual_severity=0
+                azureCFViolence_severity=0


Update variable names to be snake case.

dlmgary · 2024-05-01T15:05:03Z

pyrit/orchestrator/crescendo_orchestrator.py

+                    print('Meta judge was triggered and will flip the decision')
+                    print('Meta reasoning: %s'%metaResoning)


we should use logging module instead of print statements.

dlmgary · 2024-05-01T15:06:39Z

pyrit/score/crescendo_judge.py

+            else:
+                return self.evalAzureContentFilterAPI(text,counter+1)
+
+        except Exception as ex:


we should be more specific about the exception we're catching.

dlmgary · 2024-05-01T15:07:24Z

pyrit/prompt_target/crescendo_target.py

+            else:
+                output = llm.invoke(inputObject)
+            return True, output
+        except Exception as e:


we should be catching a more narrow exception. :)

dlmgary · 2024-05-01T15:12:38Z

pyrit/orchestrator/crescendo_orchestrator.py

+# from langchain_openai import AzureChatOpenAI
+from langchain.schema import AIMessage
+from langchain.schema import HumanMessage


I recommend we have a team discussion about taking a hard dependency on LangChain. If we decide to go this route, then a lot of the other components we've created can be migrated to conform to LangChain's interfaces.

cc: @romanlutz @rdheekonda @rlundeen2 @nina-msft @cseifert1

rlundeen2 · 2024-08-01T17:48:58Z

Closing in favor of this: #275

Addin crescendo

f280949

romanlutz reviewed Apr 19, 2024

View reviewed changes

rlundeen2 reviewed Apr 19, 2024

View reviewed changes

dlmgary reviewed May 1, 2024

View reviewed changes

dlmgary mentioned this pull request Jun 6, 2024

FEAT: Implements Crescendo-style attack based on system prompt. #237

Merged

rlundeen2 closed this Aug 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT: Adding crescendo #151

FEAT: Adding crescendo #151

AhmedSalem2 commented Apr 19, 2024 •

edited

Loading

romanlutz Apr 19, 2024

romanlutz Apr 19, 2024

romanlutz Apr 19, 2024

romanlutz Apr 19, 2024

romanlutz Apr 19, 2024

romanlutz Apr 19, 2024

romanlutz Apr 19, 2024

romanlutz Apr 19, 2024

romanlutz Apr 19, 2024

romanlutz Apr 19, 2024

rlundeen2 Apr 19, 2024

rlundeen2 Apr 19, 2024

rlundeen2 Apr 19, 2024

rlundeen2 Apr 19, 2024

rlundeen2 Apr 19, 2024

rlundeen2 Apr 19, 2024

rlundeen2 Apr 19, 2024

rlundeen2 Apr 19, 2024

rlundeen2 Apr 19, 2024

dlmgary May 1, 2024

dlmgary May 1, 2024

dlmgary May 1, 2024

dlmgary May 1, 2024

dlmgary May 1, 2024

rlundeen2 commented Aug 1, 2024



		# Adapting the chat history to the Llama-2 format
		def langhchain_to_llama2(inputObject,Llama2Tokenizer):

	def langhchain_to_llama2(inputObject,Llama2Tokenizer):
	def langchain_to_llama2(inputObject,Llama2Tokenizer):


		return False,'Error: Incorrect JSON format'

		# Adapting the chat history to the Gimini format

	# Adapting the chat history to the Gimini format
	# Adapting the chat history to the Gemini format

	def callLLM(self, inputObject):
	def call_llm(self, inputObject):

	# Refuse judge is a the Judge with a different prompt meta prompt that checks if the response of the target model is a rejection for an ethical/legal reason
	# Refuse judge is a the Judge with a different meta prompt that checks if the response of the target model is a rejection for an ethical/legal reason



		# Generating the next question/prompt of Crescendo
		def getNewQuestion(attackerLLM,metaPromptFile,lastResponse,jailbreakTarget,questionsWithSummary=None,verbose=False,round=None,resultFlag=None):

		print('Meta judge was triggered and will flip the decision')
		print('Meta reasoning: %s'%metaResoning)



		class CrescendoLLM():

FEAT: Adding crescendo #151

FEAT: Adding crescendo #151

Conversation

AhmedSalem2 commented Apr 19, 2024 • edited Loading

Description

Tests and Documentation

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rlundeen2 commented Aug 1, 2024

AhmedSalem2 commented Apr 19, 2024 •

edited

Loading