Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for Openbmb/MiniCPM #504

Merged
merged 1 commit into from
Jun 21, 2024

Conversation

LDLINGLINGLING
Copy link
Contributor

@LDLINGLINGLING LDLINGLINGLING commented Jun 17, 2024

Hello, I am a staff member of openbmb responsible for the open source community. In this pull request, support for our openbmb/MiniCPM model has been added. The following huggingface address of the awq quantitative model:
MiniCPM2_2b_awq_int4
MiniCPM2_1b_awq_int4

I also completed the perplexity test of the above model on the wikitext test set, and the results are as follows:

awq model: awq_cpm_1b_4bit
gpu usage: 1.54GB
Perplexity 8.867: 100%|███████████████████████████████████████████| 164/164 [00:28<00:00, 5.84it/s]
pretrained model: MiniCPM-1B-sft-bf16
gpu usage: 3.24GB
Perplexity 8.576: 100%|███████████████████████████████████████████| 164/164 [00:10<00:00, 15.25it/s]
gptq model: minicpm_1b_4bit
gpu usage: 1.9GB
Perplexity 9.416: 100%|███████████████████████████████████████████| 164/164 [00:09<00:00, 17.81it/s]

awq model: awq_cpm_2b_4bit
gpu usage: 2.75GB
Perplexity 8.152: 100%|███████████████████████████████████████████| 159/159 [00:33<00:00, 4.70it/s]
pretrained model: miniCPM-bf16
gpu usage: 5.93GB
Perplexity 7.981: 100%|███████████████████████████████████████████| 159/159 [00:17<00:00, 9.18it/s]
gptq model: minicpm_2b_4bit
gpu usage: 3.02GB
Perplexity 8.669: 100%|███████████████████████████████████████████| 159/159 [00:14<00:00, 10.65it/s]
If the above code meets your requirements, we look forward to merging it into the master branch.

@LDLINGLINGLING
Copy link
Contributor Author

LDLINGLINGLING commented Jun 17, 2024

I don't have permission to submit code. Please tell me how to get permission. Openbmb is a open source community in China. Minicpm is the main open source language model of the community. Thank you very much.

@casper-hansen
Copy link
Owner

@LDLINGLINGLING thank you for contributing to AutoAWQ. I will review and merge it into the main branch once I have tested it.

I have researched the MiniCPM models before and they are indeed great, some of the best work on small models coming out of China.

@LDLINGLINGLING
Copy link
Contributor Author

Thank you again。

@casper-hansen casper-hansen merged commit a4039bf into casper-hansen:main Jun 21, 2024
@casper-hansen
Copy link
Owner

Thanks for your contribution. I have tested the model and it works.

@casper-hansen
Copy link
Owner

Hi @LDLINGLINGLING. AutoAWQ also supports multimodal models. I would love to support MiniCPM-V-2. If you have time and interest, I will certainly help you review any pull request to provide support for a quantized multimodal model.

@LDLINGLINGLING
Copy link
Contributor Author

Sorry for replying to you so late. We have been too busy releasing minicpmv2.6 recently. I am now preparing to do the awq quantification of minicpmv2.6. Can you give me an example? Thank you very much

@casper-hansen
Copy link
Owner

@LDLINGLINGLING
Copy link
Contributor Author

LDLINGLINGLING commented Aug 8, 2024

can you give me a example of llama next use awq interface to quantize? is need a image and text data?

@casper-hansen
Copy link
Owner

can you give me a example of llama next use awq interface to quantize? is need a image and text data?

For llava next, we only quantize the text part of the model. I'm not sure if it's compatible with MiniCPM-V

@casper-hansen
Copy link
Owner

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

model_path = 'llava-hf/llama3-llava-next-8b-hf'
quant_path = 'llama3-llava-next-8b-awq'
quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }

# Load model
model = AutoAWQForCausalLM.from_pretrained(
    model_path, device_map="cuda", **{"low_cpu_mem_usage": True}
)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

# Quantize
model.quantize(tokenizer, quant_config=quant_config)

# Save quantized model
model.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)

print(f'Model is quantized and saved at "{quant_path}"')

@LDLINGLINGLING
Copy link
Contributor Author

hi,I have quantified minicpmv2.6, but there is currently a problem, that is, the speed drops a lot. Is there any solution? Is there a big speed improvement after fuser?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants