Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load the model into GPU or device_map using HQQModelForCausalLM.from_pretrained? #61

Closed
icoicqico opened this issue Apr 23, 2024 · 12 comments

Comments

@icoicqico
Copy link

icoicqico commented Apr 23, 2024

Hello, it seems like using HQQModelForCausalLM.from_pretrained cant load the model using device_map or load in GPU causing the computer crash due to not enough RAM. But when I use the original AutoModelForCausalLM, I can pass device map and then it will offload layers between CPU and GPU and it won't crash. Because of this I am unable to use this library to load large model. Is there any method to solve this? Thanks.

@mobicham
Copy link
Collaborator

Hi @icoicqico , yes correct, because the current implementation loads the whole model on CPU first before quantizing.
It's in the to-do list to load from disk instead of loading from the RAM.
Which model do you want to use?

@icoicqico
Copy link
Author

Thanks for the reply, I am trying to finetune Mixtral 8x7b and llama2 70b.

@mobicham
Copy link
Collaborator

@icoicqico
Copy link
Author

Thanks for the quantized checkpoint, I tried to use the Mixtral-8x7B-v0.1-hf-4bit_g64-HQQ from your huggingface repo, and when I try to train it, I got an error,
line 659, in dequantize_Wq_aten
return hqq_aten.dequantize(
AttributeError: 'NoneType' object has no attribute 'dequantize

@mobicham
Copy link
Collaborator

It didn't install the CUDA backend, what kind of GPU do you have?
Otherwise, try:
HQQLinear.set_backend(HQQBackend.PYTORCH_COMPILE)

@icoicqico
Copy link
Author

Thanks for the reply. I have RTX A6000 x 2, I will try to use the backend you mentioned, thanks.

@mobicham
Copy link
Collaborator

Then it's fine, it should work fine with a single A6000 !
Can you try:
import hqq_aten

@icoicqico
Copy link
Author

Then it's fine, it should work fine with a single A6000 ! Can you try: import hqq_aten

Importing this causing ModuleNotFoundError, I install the package with pip install git+https://github.com/mobiusml/hqq.git.

@mobicham
Copy link
Collaborator

That confirms it, the CUDA backend is not installed. Can you try:

git clone https://github.com/mobiusml/hqq.git;
cd hqq/kernels/;
python setup_cuda.py install; 
cd ../..;

Let me know what kind of error you get

@icoicqico
Copy link
Author

That confirms it, the CUDA backend is not installed. Can you try:

git clone https://github.com/mobiusml/hqq.git;
cd hqq/kernels/;
python setup_cuda.py install; 
cd ../..;

Let me know what kind of error you get

The detected CUDA version (12.0) mismatches the version that was used to compile
PyTorch (11.8). Please make sure to use the same CUDA versions.
Maybe because I am using CUDA 12.0?

@mobicham
Copy link
Collaborator

Yeah, you have an older Pytorch version. Try to update to nightly build and make sure you use CUDA 12.1

@mobicham
Copy link
Collaborator

mobicham commented May 3, 2024

You can now use device_map with hqq + transformers: huggingface/transformers#29637

@mobicham mobicham closed this as completed May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants