WenMind Benchmark

WenMind is a comprehensive benchmark dedicated for evaluating Large Language Models (LLMs) in Chinese Classical Literature and Language Arts (CCLLA). WenMind covers the sub-domains of Ancient Prose, Ancient Poetry, and Ancient Literary Culture, comprising 4,875 question-answer pairs, spanning 42 fine-grained tasks, 3 question formats, and 2 evaluation scenarios: domain-oriented and capability-oriented.

Download

You can obtain the complete WenMind evaluation dataset from WenMind Benchmark folder on GitHub.

Data Format

  {
    "id": 2464,
    "domain": "ancient literary culture",
    "capability": "knowledge",
    "question_format": "QA",
    "coarse_grained_task_zh": "成语",
    "coarse_grained_task_en": "idiom",
    "fine_grained_task_zh": "成语解释",
    "fine_grained_task_en": "idiom explanation",
    "question": "解释下面成语的意思：\n暮去朝来",
    "answer": "黄昏过去，清晨又到来。形容时光流逝。"
  }

TODO

We will soon release the code for model evaluation.

License

The work is licensed under a MIT License.

The WenMind benchmark is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
WenMind Benchmark		WenMind Benchmark
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WenMind Benchmark

Download

Data Format

TODO

License

About

Releases

Packages

Contributors 2

SCUT-DLVCLab/WenMind

Folders and files

Latest commit

History

Repository files navigation

WenMind Benchmark

Download

Data Format

TODO

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages