add support for Openbmb/MiniCPM #504

LDLINGLINGLING · 2024-06-17T06:11:21Z

Hello, I am a staff member of openbmb responsible for the open source community. In this pull request, support for our openbmb/MiniCPM model has been added. The following huggingface address of the awq quantitative model:
MiniCPM2_2b_awq_int4
MiniCPM2_1b_awq_int4

I also completed the perplexity test of the above model on the wikitext test set, and the results are as follows:

awq model： awq_cpm_1b_4bit
gpu usage: 1.54GB
Perplexity 8.867: 100%|███████████████████████████████████████████| 164/164 [00:28<00:00, 5.84it/s]
pretrained model： MiniCPM-1B-sft-bf16
gpu usage: 3.24GB
Perplexity 8.576: 100%|███████████████████████████████████████████| 164/164 [00:10<00:00, 15.25it/s]
gptq model： minicpm_1b_4bit
gpu usage: 1.9GB
Perplexity 9.416: 100%|███████████████████████████████████████████| 164/164 [00:09<00:00, 17.81it/s]

awq model： awq_cpm_2b_4bit
gpu usage: 2.75GB
Perplexity 8.152: 100%|███████████████████████████████████████████| 159/159 [00:33<00:00, 4.70it/s]
pretrained model： miniCPM-bf16
gpu usage: 5.93GB
Perplexity 7.981: 100%|███████████████████████████████████████████| 159/159 [00:17<00:00, 9.18it/s]
gptq model： minicpm_2b_4bit
gpu usage: 3.02GB
Perplexity 8.669: 100%|███████████████████████████████████████████| 159/159 [00:14<00:00, 10.65it/s]
If the above code meets your requirements, we look forward to merging it into the master branch.

LDLINGLINGLING · 2024-06-17T06:37:26Z

I don't have permission to submit code. Please tell me how to get permission. Openbmb is a open source community in China. Minicpm is the main open source language model of the community. Thank you very much.

casper-hansen · 2024-06-17T10:19:37Z

@LDLINGLINGLING thank you for contributing to AutoAWQ. I will review and merge it into the main branch once I have tested it.

I have researched the MiniCPM models before and they are indeed great, some of the best work on small models coming out of China.

LDLINGLINGLING · 2024-06-19T09:14:02Z

Thank you again。

casper-hansen · 2024-06-21T08:01:52Z

Thanks for your contribution. I have tested the model and it works.

casper-hansen · 2024-06-21T18:39:52Z

Hi @LDLINGLINGLING. AutoAWQ also supports multimodal models. I would love to support MiniCPM-V-2. If you have time and interest, I will certainly help you review any pull request to provide support for a quantized multimodal model.

LDLINGLINGLING · 2024-08-08T06:56:19Z

Sorry for replying to you so late. We have been too busy releasing minicpmv2.6 recently. I am now preparing to do the awq quantification of minicpmv2.6. Can you give me an example? Thank you very much

casper-hansen · 2024-08-08T08:38:28Z

@LDLINGLINGLING You can find the implementation for LLaVa next here:
https://github.com/casper-hansen/AutoAWQ/blob/main/awq/models/llava_next.py

Documentation (quantization): https://casper-hansen.github.io/AutoAWQ/examples/#vision-language-models
Documentation (infernece): https://casper-hansen.github.io/AutoAWQ/examples/#llava-multimodal

LDLINGLINGLING · 2024-08-08T12:27:01Z

can you give me a example of llama next use awq interface to quantize? is need a image and text data?

casper-hansen · 2024-08-08T12:32:56Z

can you give me a example of llama next use awq interface to quantize? is need a image and text data?

For llava next, we only quantize the text part of the model. I'm not sure if it's compatible with MiniCPM-V

casper-hansen · 2024-08-08T12:33:31Z

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

model_path = 'llava-hf/llama3-llava-next-8b-hf'
quant_path = 'llama3-llava-next-8b-awq'
quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }

# Load model
model = AutoAWQForCausalLM.from_pretrained(
    model_path, device_map="cuda", **{"low_cpu_mem_usage": True}
)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

# Quantize
model.quantize(tokenizer, quant_config=quant_config)

# Save quantized model
model.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)

print(f'Model is quantized and saved at "{quant_path}"')

LDLINGLINGLING · 2024-08-12T07:14:44Z

hi，I have quantified minicpmv2.6, but there is currently a problem, that is, the speed drops a lot. Is there any solution? Is there a big speed improvement after fuser?

add support for minicpm

4b39354

casper-hansen merged commit a4039bf into casper-hansen:main Jun 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add support for Openbmb/MiniCPM #504

add support for Openbmb/MiniCPM #504

LDLINGLINGLING commented Jun 17, 2024 •

edited

Loading

LDLINGLINGLING commented Jun 17, 2024 •

edited

Loading

casper-hansen commented Jun 17, 2024

LDLINGLINGLING commented Jun 19, 2024

casper-hansen commented Jun 21, 2024

casper-hansen commented Jun 21, 2024

LDLINGLINGLING commented Aug 8, 2024

casper-hansen commented Aug 8, 2024

LDLINGLINGLING commented Aug 8, 2024 •

edited

Loading

casper-hansen commented Aug 8, 2024

casper-hansen commented Aug 8, 2024

LDLINGLINGLING commented Aug 12, 2024

add support for Openbmb/MiniCPM #504

add support for Openbmb/MiniCPM #504

Conversation

LDLINGLINGLING commented Jun 17, 2024 • edited Loading

LDLINGLINGLING commented Jun 17, 2024 • edited Loading

casper-hansen commented Jun 17, 2024

LDLINGLINGLING commented Jun 19, 2024

casper-hansen commented Jun 21, 2024

casper-hansen commented Jun 21, 2024

LDLINGLINGLING commented Aug 8, 2024

casper-hansen commented Aug 8, 2024

LDLINGLINGLING commented Aug 8, 2024 • edited Loading

casper-hansen commented Aug 8, 2024

casper-hansen commented Aug 8, 2024

LDLINGLINGLING commented Aug 12, 2024

LDLINGLINGLING commented Jun 17, 2024 •

edited

Loading

LDLINGLINGLING commented Jun 17, 2024 •

edited

Loading

LDLINGLINGLING commented Aug 8, 2024 •

edited

Loading