Mathador-LM: A Dynamic Benchmark for Mathematical Reasoning on Large Language Models

Environment Setup

conda create --name mathador python=3.11 -y
conda activate mathador
pip install -r requirements.txt
Get your personal API key for any of the following providers: OpenAI, TogetherAI, Anthropic.
Open eval.yaml and configure which models to evaluate. We provide examples for all three model providers.

Usage

For convenience, we attach mathador-10000.jsonl dataset that we used for some runs. If you would like to generate a new instance of the dataset, please configure generate_dataset.yaml and run:

python generate_dataset.py generate_dataset.yaml

To run Mathador-LM benchmark, please specify your desired parameters in eval.yaml and run:

TOGETHER_API_KEY=<your_key> python eval.py eval.yaml

If you would like to override arguments from eval.yaml directly from command-line, please use:

TOGETHER_API_KEY=<your_key> python eval.py eval.yaml shots=20

The result of the evaluation will be saved in results.csv.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
base.py		base.py
eval.py		eval.py
eval.yaml		eval.yaml
generate_dataset.py		generate_dataset.py
generate_dataset.yaml		generate_dataset.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mathador-LM: A Dynamic Benchmark for Mathematical Reasoning on Large Language Models

Environment Setup

Usage

About

Releases

Packages

Languages

License

IST-DASLab/Mathador-LM

Folders and files

Latest commit

History

Repository files navigation

Mathador-LM: A Dynamic Benchmark for Mathematical Reasoning on Large Language Models

Environment Setup

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages