Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT: add support for question answering benchmark #94

Merged
merged 16 commits into from
Mar 19, 2024

Conversation

dlmgary
Copy link
Contributor

@dlmgary dlmgary commented Mar 12, 2024

Description

This PR:

  • Adds support for question answering benchmark for language models.
  • Implements a new QuestionAnsweringBenchmarkOrchestrator to evaluate different benchmarks.
  • Implements a new QuestionAnswerScorer to handle question answering logic post model inference.
  • Adds WMD dataset for bio, chem, and cyber from The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
  • Implements a new QuestionAnsweringDataset model to store and distribute question answering datasets in the future.

Tests

  • no new tests required
  • new tests added
  • existing tests adjusted

Documentation

  • no documentation changes needed
  • documentation added or edited
  • example notebook added or updated

The Weapons of Mass Destruction Proxy (WMDP) benchmark is a publicly available dataset published Li, Nathaniel, et al. "The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning." arXiv preprint arXiv:2403.03218 (2024). The raw dataset is available at https://github.com/centerforaisafety/wmdp.

The format of the dataset has been updated to be compliant with the `QuestionAnsweringDataset` PyRIT model so it can be used for Q&A evaluations. The content of the dataset has not been changed.
pyrit/models.py Outdated Show resolved Hide resolved
@dlmgary dlmgary requested a review from nina-msft March 18, 2024 22:48
pyrit/models.py Outdated Show resolved Hide resolved
- create a new `QuestionAnswerScorer`
- move default system prompt to new YAML file
- refactor `QuestionAnsweringBenchmarkOrchestrator` to accept scorer during initialization.
@dlmgary dlmgary changed the title [draft] feat: add support for question answering benchmark feat: add support for question answering benchmark Mar 19, 2024
doc/code/orchestrator.py Outdated Show resolved Hide resolved
doc/code/orchestrator.py Show resolved Hide resolved
pyrit/orchestrator/benchmark_orchestrator.py Show resolved Hide resolved
pyrit/orchestrator/benchmark_orchestrator.py Show resolved Hide resolved
pyrit/score/scorer.py Show resolved Hide resolved
@dlmgary dlmgary changed the title feat: add support for question answering benchmark FEAT: add support for question answering benchmark Mar 19, 2024
@dlmgary dlmgary merged commit a06328d into Azure:main Mar 19, 2024
4 checks passed
@dlmgary dlmgary deleted the dlmgary_benchmarking branch March 19, 2024 22:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants