-
Notifications
You must be signed in to change notification settings - Fork 339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT: Adding crescendo #151
Conversation
|
||
|
||
# Adapting the chat history to the Llama-2 format | ||
def langhchain_to_llama2(inputObject,Llama2Tokenizer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def langhchain_to_llama2(inputObject,Llama2Tokenizer): | |
def langchain_to_llama2(inputObject,Llama2Tokenizer): |
# from langchain_openai import AzureChatOpenAI | ||
from langchain.schema import AIMessage | ||
from langchain.schema import HumanMessage | ||
# from transformers import AutoModelForCausalLM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove unused imports
Relatedly, you may want to run pre-commit run --all-files
if you haven't yet
|
||
|
||
class CrescendoLLM(): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a docstring.
|
||
return False,'Error: Incorrect JSON format' | ||
|
||
# Adapting the chat history to the Gimini format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Adapting the chat history to the Gimini format | |
# Adapting the chat history to the Gemini format |
return history | ||
|
||
# Calling the LLM. Adapting the input format depending on the LLM | ||
def callLLM(self, inputObject): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def callLLM(self, inputObject): | |
def call_llm(self, inputObject): |
nit: method names should be snake case
counter = 0 | ||
counter2 = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something a bit more descriptive would help 🙂
|
||
# Setting the question/prompt as the next message to the target model | ||
history.append(HumanMessage(content=questionToAsk)) | ||
print("Prompt: %s"%questionToAsk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use f-strings here and elsewhere
# Checking if the target model refused to respond | ||
if flag: | ||
# Checking if the target model refused to respond due to ethical/legal reasons using a refuse judge | ||
# Refuse judge is a the Judge with a different prompt meta prompt that checks if the response of the target model is a rejection for an ethical/legal reason |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Refuse judge is a the Judge with a different prompt meta prompt that checks if the response of the target model is a rejection for an ethical/legal reason | |
# Refuse judge is a the Judge with a different meta prompt that checks if the response of the target model is a rejection for an ethical/legal reason |
refuseJudgeResults = self._judge_llm.evaluateInputGPT4(response.content,questionToAsk,mode='refuseJudge') | ||
refuseJudgeFlag = refuseJudgeResults['resultFlag'] | ||
|
||
flag = not (refuseJudgeFlag == 'true' or refuseJudgeFlag == 'True' or refuseJudgeFlag == True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this check be inside the judge code and just return a bool?
|
||
|
||
# Generating the next question/prompt of Crescendo | ||
def getNewQuestion(attackerLLM,metaPromptFile,lastResponse,jailbreakTarget,questionsWithSummary=None,verbose=False,round=None,resultFlag=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the methods need docstrings. Also, do you want this to be exposed and have users call it? Or should they only use auto_crescendo? Assuming it's the latter I'd suggest starting the names with underscores, so e.g., _get_new_question
lastResponse = '' | ||
pathToSave = './%s'%folderName | ||
# The history of the conversation for the target model | ||
history = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conversation history would work better with memory than saving to files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you would get the conversation history using the memory interface
judge_llm: CrescendoJudge, | ||
metaPromptsPath: str, | ||
crescendo_variant: int = 1, | ||
max_rounds: int = 10, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should also have memory passed in and initialized
from pyrit.orchestrator import Orchestrator | ||
from pyrit.prompt_target import CrescendoLLM | ||
from pyrit.score import CrescendoJudge | ||
import os |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This (and target) should have unit tests for all public methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And part of that is probably refactoring. Right now this is tough to follow; try to break it up into small understandable functions that can be tested for correctness.
print(f'Response: {lastResponse}') | ||
if usePerspectiveAPI: | ||
# Evaluate the response using the Perspective API | ||
answerScore, toxicityCategory,allScores = self._judge_llm.evaluateInput(lastResponse) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a lot of this code I think should be abstracted to the scorer/judge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scorer/Judge code is responsible for evaluating responses from a target. The orchestrator's job is to decide what to do with that evaluation.
from transformers import AutoModelForCausalLM | ||
from vllm import LLM, SamplingParams | ||
from transformers import LlamaForCausalLM, AutoTokenizer | ||
import time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Roman had a similar comment, but the target should mostly just be how to communicate with the endpoint. For example, you probably don't need a separate target for crescendo. But you would need an "Anthropic" target (which we don't have yet) to do crescendo against Anthropic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also doesn't implement the required interface for targets, so can't plug into the rest of the framework as is
azureCFHate_severity=0 | ||
azureCFSelfHarm_severity=0 | ||
azureCFSexual_severity=0 | ||
azureCFViolence_severity=0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update variable names to be snake case.
print('Meta judge was triggered and will flip the decision') | ||
print('Meta reasoning: %s'%metaResoning) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should use logging module instead of print statements.
else: | ||
return self.evalAzureContentFilterAPI(text,counter+1) | ||
|
||
except Exception as ex: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should be more specific about the exception we're catching.
else: | ||
output = llm.invoke(inputObject) | ||
return True, output | ||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should be catching a more narrow exception. :)
# from langchain_openai import AzureChatOpenAI | ||
from langchain.schema import AIMessage | ||
from langchain.schema import HumanMessage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend we have a team discussion about taking a hard dependency on LangChain. If we decide to go this route, then a lot of the other components we've created can be migrated to conform to LangChain's interfaces.
Closing in favor of this: #275 |
Description
Adding the code for Crescendomation to PyRIT
Tests and Documentation