Release OpenCompass v0.2.3 · open-compass/opencompass

The OpenCompass team is thrilled to announce the release of OpenCompass v0.2.3! This version is packed with new features, crucial fixes, and documentation updates to improve your experience. We're continuously working to enhance OpenCompass, making it more robust and versatile for all users.

🌟 Highlights:

Enhanced Model Support: Introduction of new models and configurations, including support for the LightllmApi, lmdeploy pytorch engine, and more.
New Datasets and Benchmarks: Expanding our dataset repository with additions like OpenFinData, lveval benchmark, and an upgrade to Needlebench.
Documentation and Sync Improvements: Updated dataset pack URLs, fixed documentation errors, and synchronized with internal codes for consistency.

Explore the key updates in this release:

🌟 New Features:

📦 Dataset and Benchmark Expansion:
- Support for new datasets like OpenFinData and an upgrade to Needlebench, offering broader evaluation capabilities (#896, #913).
- Introduction of the lveval benchmark to enrich the evaluation landscape (#914).
🛠 Model and API Integrations:
- Enhanced functionality with support for LightllmApi input_format and prompt templates, alongside the introduction of get_ppl for TurbomindModel (#888, #878).
- New model configurations added, including support for gemini and deepseek-coder, further broadening the tools available for users (#931, #943).
📖 Documentation and Sync Updates:
- Updated dataset pack URLs and rank link in README to ensure users have access to the latest resources (#922, #911).
- Several syncs with internal codes and GitHub blacklist update to maintain consistency and integrity (#929, #953).

🐛 Bug Fixes:

Addressed various configuration and template issues to ensure smoother operation across different models and benchmarks (#894, #893).
Fixed issues related to IFEval, including type hints and config bugs, enhancing evaluation accuracy and functionality (#906, #915).

🎉 Welcome New Contributors:

We're delighted to welcome our new contributors: @xu-song, @x22x22, @yuantao2108, and @fanqiNO1. Your contributions are invaluable to the growth of OpenCompass!

🔗 Full Changelog

Support LightllmApi input_format by @helloyongyang in #888
[Fix] rename qwen2-beta -> qwen1.5 by @Leymore in #894
[Fix] Fix chatglm2 config by @Leymore in #893
[Fix] Fix moss template config by @xu-song in #897
Support lmdeploy pytorch engine by @RunningLeon in #875
[Fix] fix ifeval by @bittersweet1999 in #906
[Fix] fix ifeval by @jingmingzhuo in #909
[Fix] Fix type hint in IFEval for python<=3.8 by @Leymore in #915
[Docs] Update dataset pack urls by @Leymore in #922
[Sync] update github blacklist by @Leymore in #929
[Feature] add support for gemini by @bittersweet1999 in #931
[Feature] Support OpenFinData by @Skyfall-xzz in #896
[Fix]Fixed the problem of never entering task.run() mode in local scheduling mode. by @x22x22 in #930
Add VLLM Model Configs by @DseidLi in #938
[Feature] Upgrade the needle-in-a-haystack experiment to Needlebench by @DseidLi in #913
[Feature] add lveval benchmark by @yuantao2108 in #914
[Sync] Sync with internal 2023.03.04 by @Leymore in #941
[Fix] fix a bug of humanevalplus config by @jingmingzhuo in #944
[Feature] Add configs of deepseek-coder by @jingmingzhuo in #943
Fix FinanceIQ_datasets import error by @xu-song in #939
[Docs] Update rank link in README by @fanqiNO1 in #911
Support get_ppl for TurbomindModel by @RunningLeon in #878
Support prompt template for LightllmApi. Update LightllmApi token bucket. by @helloyongyang in #945
Fix LightllmApi ppl test by @helloyongyang in #951
[Fix] Chinese version of ReadTheDoc by @tonysy in #947
[fix] add different temp for different question in mtbench by @bittersweet1999 in #954
[Sync] Sync with internal codes 2024.03.08 by @Leymore in #953
[Docs] Update README by @tonysy in #956
[Misc] Update owners by @Leymore in #961
[Fix] Use logger.error on failure by @Leymore in #960
[Sync] Bump version 0.2.3 by @Leymore in #957

For a detailed overview of all changes, check out our Full Changelog.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenCompass v0.2.3

🌟 Highlights:

🌟 New Features:

🐛 Bug Fixes:

🎉 Welcome New Contributors:

🔗 Full Changelog

Contributors