Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT Add scale scorer #274

Merged
merged 16 commits into from
Jul 23, 2024
Merged

Conversation

romanlutz
Copy link
Contributor

@romanlutz romanlutz commented Jul 8, 2024

Description

So far we don't have a scale scorer that can accommodate more flexible scales than Likert and which don't require level-specific descriptions.

TAP, PAIR, and Crescendo all uses some variation of this. It could help to have a bigger scale and not requiring detailed descriptions of each level which are hard to create and sometimes difficult to distinguish. One could argue that there's no point in distinct levels if there's no describable distinction them but that's a whole different can of worms 😄

Tests and Documentation

Added tests. Holding off on extensive documentation for now since it uses a slightly different signature on the scoring methods which includes the task. If we end up doing the same on all existing scorers we can do a larger adjustment of the docs.

@romanlutz romanlutz marked this pull request as ready for review July 12, 2024 02:17
@romanlutz romanlutz mentioned this pull request Jul 12, 2024
Copy link
Contributor

@dlmgary dlmgary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not blocking the PR merge but wanted to make sure comments were read before merging! Feel free to reach out to chat, changes are pretty minor. :)

pyrit/score/self_ask_scale_scorer.py Outdated Show resolved Hide resolved
@@ -16,36 +17,39 @@ class Scorer(abc.ABC):
scorer_type: ScoreType

@abstractmethod
async def score_async(self, request_response: PromptRequestPiece) -> list[Score]:
async def score_async(self, request_response: PromptRequestPiece, *, task: Optional[str] = None) -> list[Score]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question. Shouldn't the definition be as follow?

Suggested change
async def score_async(self, request_response: PromptRequestPiece, *, task: Optional[str] = None) -> list[Score]:
async def score_async(self, *, request_response: PromptRequestPiece, task: Optional[str] = None) -> list[Score]:

That's the pattern we've been used throughout the code. It will force this PR to modify more files but will keep things consistent and more readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't put it there for three reasons. One is what you mentioned: it will break backwards compatibility. Perhaps not a huge reason, granted!

The other is that request_response is not a great name IMO and if I had to bet we'll change that sooner or later at which point that would mean another breaking change (or deprecation over several versions which is incredibly painful).

Lastly, as long as there's only 1 positional arg it's still clear. As soon as you have multiple then the order could be either way and things can get confusing. The task is truly "optional" in the sense that most scorers don't currently use it, so treating it differently makes sense to me.

pyrit/score/self_ask_category_scorer.py Outdated Show resolved Hide resolved

Returns:
list[Score]: A list of Score objects representing the results.
"""
raise NotImplementedError("score_async method not implemented")

@abstractmethod
def validate(self, request_response: PromptRequestPiece):
def validate(self, request_response: PromptRequestPiece, *, task: Optional[str] = None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same suggestion as below.

Suggested change
def validate(self, request_response: PromptRequestPiece, *, task: Optional[str] = None):
def validate(self, *, request_response: PromptRequestPiece, task: Optional[str] = None):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pyrit/score/self_ask_conversation_objective_scorer.py Outdated Show resolved Hide resolved
pyrit/score/self_ask_scale_scorer.py Show resolved Hide resolved
chat_target: PromptChatTarget,
scale_path: Optional[Path] = None,
scale: Optional[Scale] = None,
memory: MemoryInterface = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
memory: MemoryInterface = None,
memory: Optional[MemoryInterface] = None,

Pre-commit hooks should've complained about this! :P

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Comment on lines +122 to +124
self._system_prompt = scoring_instructions_template.apply_custom_metaprompt_parameters(**system_prompt_kwargs)

self._chat_target: PromptChatTarget = chat_target
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove extra line here.

Suggested change
self._system_prompt = scoring_instructions_template.apply_custom_metaprompt_parameters(**system_prompt_kwargs)
self._chat_target: PromptChatTarget = chat_target
self._system_prompt = scoring_instructions_template.apply_custom_metaprompt_parameters(**system_prompt_kwargs)
self._chat_target: PromptChatTarget = chat_target

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose the rule of thumb is that if flake8 / black etc. don't mind then neither do we (?)

pyrit/score/self_ask_scale_scorer.py Outdated Show resolved Hide resolved
@dlmgary dlmgary self-requested a review July 23, 2024 01:08
@romanlutz romanlutz dismissed dlmgary’s stale review July 23, 2024 02:46

As discussed I'm resetting this one and will check with you offline @dlmgary ! If needed I'll start a separate follow-up PR

@romanlutz romanlutz merged commit 61a7502 into Azure:main Jul 23, 2024
4 checks passed
@romanlutz romanlutz deleted the romanlutz/scale_scorer branch July 23, 2024 04:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants