Skip to content

Code and dataset for the paper: MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation Dataset (https://arxiv.org/pdf/2406.02106).

License

Notifications You must be signed in to change notification settings

HKUST-KnowComp/MARS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🪐MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation Dataset

This is the official code and data repository for the paper: 🪐MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation Dataset.

Overview

1. Download Dataset/Model Checkpoints

The 🪐MARS benchmark and our best model checkpoints on three tasks in 🪐MARS can be downloaded at this link.

2. Benchmark Curation

Code for instructing ChatGPT to curate the 🪐MARS benchmark can be found in the benchmark_curation folder.

3. Evaluation

Code for evaluating language models on the 🪐MARS benchmark can be found in the evaluation folder.

4. Citing this work

Please use the bibtex below for citing our paper:

@inproceedings{Wang2024MARSBT,
  title={MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation Dataset},
  author={Weiqi Wang and Yangqiu Song},
  year={2024},
  url={https://doi.org/10.48550/arXiv.2406.02106},
  doi={10.48550/arXiv.2406.02106}
}

5. Acknowledgement

The authors of this paper were supported by the NSFC Fund (U20B2053) from the NSFC of China, the RIF (R6020-19 and R6021-20), and the GRF (16211520 and 16205322) from RGC of Hong Kong. We also thank the support from the UGC Research Matching Grants (RMGS20EG01-D, RMGS20CR11, RMGS20CR12, RMGS20EG19, RMGS20EG21, RMGS23CR05, RMGS23EG08).

About

Code and dataset for the paper: MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation Dataset (https://arxiv.org/pdf/2406.02106).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages