Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add reasonbench dataset #577

Merged
merged 11 commits into from
Dec 20, 2023
Merged

Conversation

Skyfall-xzz
Copy link
Contributor

Motivation

Add a benchmark that support specially evaluating reasoning ability of LLMs.

Modification

Add preprocess methods and configs for several datasets, aiming evaluating the abilities of inductive reasoning, deductive reasoning, abductive reasoning, causal reasoning, symbolic reasoning and commonsense reasoning.

BC-breaking (Optional)

N/A.

Use cases (Optional)

After ensuring that OpenCompass is installed correctly according to the previous steps and the datasets are prepared, you can evaluate the performance of the LLaMA-7b model on the ReasonBench datasets using the following command:

python run.py --models hf_llama_7b --datasets reasonbench_ppl

@Skyfall-xzz Skyfall-xzz marked this pull request as ready for review November 13, 2023 08:21
@tonysy
Copy link
Collaborator

tonysy commented Nov 13, 2023

Please provide the prompt for generation setting(for chat model)

@tonysy
Copy link
Collaborator

tonysy commented Nov 13, 2023

Also please use pre-commit to lint the code

@Skyfall-xzz
Copy link
Contributor Author

In the new commits, I provide prompt and configs for supporting generative inference as well as merge datasets that in the same category.

After ensuring that OpenCompass is installed correctly and the datasets are prepared, we are able to evaluate the performance of the LLaMA-7b model on the ReasonBench datasets in a generative way using the following command:

python run.py --models hf_llama_7b --datasets reasonbench_gen

Copy link
Collaborator

@tonysy tonysy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tonysy tonysy merged commit b35d991 into open-compass:main Dec 20, 2023
7 checks passed
Leymore pushed a commit that referenced this pull request Jan 8, 2024
* [Feature] Add reasonbench dataset

* add configs for supporting generative inference & merge datasets in the same category

* modify config filename to prompt version

* fix codes to meet pre-commit requirements

* lint the code to meet pre-commit requirements

* Align Load_data Sourcecode Briefly

* fix bugs

* reduce code redundancy
liuyaox pushed a commit to liuyaox/opencompass that referenced this pull request Jun 26, 2024
* [Feature] Add reasonbench dataset

* add configs for supporting generative inference & merge datasets in the same category

* modify config filename to prompt version

* fix codes to meet pre-commit requirements

* lint the code to meet pre-commit requirements

* Align Load_data Sourcecode Briefly

* fix bugs

* reduce code redundancy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants