Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT: Adding crescendo #151

Closed
wants to merge 1 commit into from
Closed

Conversation

AhmedSalem2
Copy link

@AhmedSalem2 AhmedSalem2 commented Apr 19, 2024

Description

Adding the code for Crescendomation to PyRIT

Tests and Documentation



# Adapting the chat history to the Llama-2 format
def langhchain_to_llama2(inputObject,Llama2Tokenizer):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def langhchain_to_llama2(inputObject,Llama2Tokenizer):
def langchain_to_llama2(inputObject,Llama2Tokenizer):

# from langchain_openai import AzureChatOpenAI
from langchain.schema import AIMessage
from langchain.schema import HumanMessage
# from transformers import AutoModelForCausalLM
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove unused imports
Relatedly, you may want to run pre-commit run --all-files if you haven't yet



class CrescendoLLM():

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a docstring.


return False,'Error: Incorrect JSON format'

# Adapting the chat history to the Gimini format
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Adapting the chat history to the Gimini format
# Adapting the chat history to the Gemini format

return history

# Calling the LLM. Adapting the input format depending on the LLM
def callLLM(self, inputObject):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def callLLM(self, inputObject):
def call_llm(self, inputObject):

nit: method names should be snake case

Comment on lines +63 to +64
counter = 0
counter2 = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something a bit more descriptive would help 🙂


# Setting the question/prompt as the next message to the target model
history.append(HumanMessage(content=questionToAsk))
print("Prompt: %s"%questionToAsk)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use f-strings here and elsewhere

# Checking if the target model refused to respond
if flag:
# Checking if the target model refused to respond due to ethical/legal reasons using a refuse judge
# Refuse judge is a the Judge with a different prompt meta prompt that checks if the response of the target model is a rejection for an ethical/legal reason
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Refuse judge is a the Judge with a different prompt meta prompt that checks if the response of the target model is a rejection for an ethical/legal reason
# Refuse judge is a the Judge with a different meta prompt that checks if the response of the target model is a rejection for an ethical/legal reason

refuseJudgeResults = self._judge_llm.evaluateInputGPT4(response.content,questionToAsk,mode='refuseJudge')
refuseJudgeFlag = refuseJudgeResults['resultFlag']

flag = not (refuseJudgeFlag == 'true' or refuseJudgeFlag == 'True' or refuseJudgeFlag == True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this check be inside the judge code and just return a bool?



# Generating the next question/prompt of Crescendo
def getNewQuestion(attackerLLM,metaPromptFile,lastResponse,jailbreakTarget,questionsWithSummary=None,verbose=False,round=None,resultFlag=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the methods need docstrings. Also, do you want this to be exposed and have users call it? Or should they only use auto_crescendo? Assuming it's the latter I'd suggest starting the names with underscores, so e.g., _get_new_question

lastResponse = ''
pathToSave = './%s'%folderName
# The history of the conversation for the target model
history = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conversation history would work better with memory than saving to files

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you would get the conversation history using the memory interface

judge_llm: CrescendoJudge,
metaPromptsPath: str,
crescendo_variant: int = 1,
max_rounds: int = 10,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also have memory passed in and initialized

from pyrit.orchestrator import Orchestrator
from pyrit.prompt_target import CrescendoLLM
from pyrit.score import CrescendoJudge
import os
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and target) should have unit tests for all public methods

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And part of that is probably refactoring. Right now this is tough to follow; try to break it up into small understandable functions that can be tested for correctness.

print(f'Response: {lastResponse}')
if usePerspectiveAPI:
# Evaluate the response using the Perspective API
answerScore, toxicityCategory,allScores = self._judge_llm.evaluateInput(lastResponse)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a lot of this code I think should be abstracted to the scorer/judge

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scorer/Judge code is responsible for evaluating responses from a target. The orchestrator's job is to decide what to do with that evaluation.

from transformers import AutoModelForCausalLM
from vllm import LLM, SamplingParams
from transformers import LlamaForCausalLM, AutoTokenizer
import time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roman had a similar comment, but the target should mostly just be how to communicate with the endpoint. For example, you probably don't need a separate target for crescendo. But you would need an "Anthropic" target (which we don't have yet) to do crescendo against Anthropic

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also doesn't implement the required interface for targets, so can't plug into the rest of the framework as is

azureCFHate_severity=0
azureCFSelfHarm_severity=0
azureCFSexual_severity=0
azureCFViolence_severity=0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update variable names to be snake case.

Comment on lines +180 to +181
print('Meta judge was triggered and will flip the decision')
print('Meta reasoning: %s'%metaResoning)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should use logging module instead of print statements.

else:
return self.evalAzureContentFilterAPI(text,counter+1)

except Exception as ex:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should be more specific about the exception we're catching.

else:
output = llm.invoke(inputObject)
return True, output
except Exception as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should be catching a more narrow exception. :)

Comment on lines +6 to +8
# from langchain_openai import AzureChatOpenAI
from langchain.schema import AIMessage
from langchain.schema import HumanMessage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend we have a team discussion about taking a hard dependency on LangChain. If we decide to go this route, then a lot of the other components we've created can be migrated to conform to LangChain's interfaces.

cc: @romanlutz @rdheekonda @rlundeen2 @nina-msft @cseifert1

@rlundeen2
Copy link
Contributor

Closing in favor of this: #275

@rlundeen2 rlundeen2 closed this Aug 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants