New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

lm-eval for llama.cpp enhancement. #1543

Open

lkk12014402 wants to merge 4 commits into main from enable_llamacpp_lm_eval

Collaborator

lkk12014402 commented May 12, 2024 •

edited

Loading

Type of Change

enable lm-eval for llama.cpp models

API not changed

Description

refer to the lm-eval official code and llama-cpp-python

improvements:

load llama.cpp model directly when do lm-eval (the official code needs launch a llama.cpp server)
For qwen models, revise the detokenize func because some error occurs during evaluation and force to add bos_id for qwen models because the llama-cpp-python doesn't add bos_id successfully. Even though some changes for qwen, I still find that the tokenizer results are different between llama.cpp and huggingface/transformers. I will verify this further.
As describe in the comments at llama-cpp-python, I implement it with a custom class, which can accelerate the post-process.


          lm-eval for llama.cpp enhancement.

184678f

lkk12014402 requested review from changwangss and hshen14

May 12, 2024 09:55

lkk12014402 requested a review from PenghuiCheng as a code owner

May 12, 2024 09:55

github-actions bot commented May 12, 2024 •

edited

Loading

⛈️ Required checks status: Has failure 🔴

Warning
If you do not have the access to re-run the CI-Summary bot, please contact VincyZhang for help. If you push a new commit, all of the workflow will be re-triggered.

Groups summary

🔴 Format Scan Tests workflow

Check ID	Status	Error details
format-scan (pylint)	failure	download	❌
format-scan (bandit)	success		✅
format-scan (cloc)	success		✅
format-scan (cpplint)	success		✅

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/models/__init__.py, intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/models/llama_cpp_lm.py.

🔴 Optimize Unit Test workflow

Check ID	Status	Error details
optimize-unit-test-baseline	success		✅
optimize-unit-test-PR-test	failure	download	❌
Genreate-OptimizeUT-Report	skipped		❓

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/models/__init__.py, intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/models/llama_cpp_lm.py.

🟢 NeuralChat Unit Test

Check ID	Status	Error details
neuralchat-unit-test-baseline	success		✅
neuralchat-unit-test-PR-test	success		✅
Generate-NeuralChat-Report	success		✅

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/models/__init__.py, intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/models/llama_cpp_lm.py.

🟢 Engine Unit Test workflow

Check ID	Status	Error details
engine-unit-test-baseline	success		✅
engine-unit-test-PR-test	success		✅
Genreate-Engine-Report	success		✅

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/models/__init__.py, intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/models/llama_cpp_lm.py.

🟢 Chat Bot Test workflow

Check ID	Status	Error details
call-inference-llama-2-7b-chat-hf / inference test	success		✅
call-inference-mpt-7b-chat / inference test	success		✅

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/models/__init__.py, intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/models/llama_cpp_lm.py.

Thank you for your contribution! 💜

Note
This comment is automatically generated and will be updates every 180 seconds within the next 6 hours. If you have any other questions, contact VincyZhang or XuehaoSun for help.


          [pre-commit.ci] auto fixes from pre-commit.com hooks

fcce20c

for more information, see https://pre-commit.ci

Collaborator Author

lkk12014402 commented May 12, 2024 •

edited

Loading

usages:

CPU

model_name = "Qwen/Qwen1.5-0.5B-Chat-GGUF"
from intel_extension_for_transformers.transformers.llm.evaluation.lm_eval import evaluate, LMEvalParser
eval_args = LMEvalParser(model = "gguf-custom",
        model_args='pretrained=' + model_name + ',ftype=' + '*q4_0.gguf',
        device = "cpu",
        tasks = "hellaswag",
        batch_size = 2,
        limit = 10)
results = evaluate(eval_args)

print(results["results"])

GPU

model_name = "Qwen/Qwen1.5-0.5B-Chat-GGUF"
from intel_extension_for_transformers.transformers.llm.evaluation.lm_eval import evaluate, LMEvalParser
eval_args = LMEvalParser(model = "gguf-custom",
        model_args='pretrained=' + model_name + ',ftype=' + '*q4_0.gguf',
        device = "cuda",
        tasks = "hellaswag",
        batch_size = 2,
        limit = 10)
results = evaluate(eval_args)

print(results["results"])

hshen14 approved these changes

View reviewed changes

VincyZhang and others added 2 commits

May 12, 2024 20:00


          Merge branch 'main' into enable_llamacpp_lm_eval

c706ae3


          Merge branch 'main' into enable_llamacpp_lm_eval

3048eae

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet