Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPTQQuantizer] How to use multi-GPU for GPTQQuantizer? #1981

Closed
2 of 4 tasks
RunTian1 opened this issue Aug 5, 2024 · 2 comments
Closed
2 of 4 tasks

[GPTQQuantizer] How to use multi-GPU for GPTQQuantizer? #1981

RunTian1 opened this issue Aug 5, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@RunTian1
Copy link

RunTian1 commented Aug 5, 2024

System Info

hello:
I encountered an out-of-memory error while attempting to quantize a model using GPTQQuantizer. The error seems to be related to the large size of the model weights. Below is the quantization code I used:

from optimum.gptq import GPTQQuantizer

quantizer = GPTQQuantizer(
    bits=4,
    dataset='wikitext2',
    block_name_to_quantize=decoder.layers,
    disable_exllama=False,
    damp_percent=0.1,
    group_size=128
)

The error message I received is as follows:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 784.00 MiB. GPU 0 has a total capacty of 10.90 GiB of which 770.44 MiB is free. Including non-PyTorch memory

Environment:
· Transformers version: 4.43.2
· Optimum version: 1.21.2
· GPU model and memory: 11GiB * 2
· CUDA version: 12.4
Question:How to use multi-GPU for GPTQQuantizer? thank you!

Who can help?

@kashif @srush @danieldk @mausch @dmaniloff How to use multi-GPU for GPTQQuantizer?

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

from optimum.gptq import GPTQQuantizer

quantizer = GPTQQuantizer(
    bits=4,
    dataset='wikitext2',
    block_name_to_quantize=decoder.layers,
    disable_exllama=False,
    damp_percent=0.1,
    group_size=128
)

Expected behavior

use multi-GPU for GPTQQuantizer?

@RunTian1 RunTian1 added the bug Something isn't working label Aug 5, 2024
@IlyasMoutawwakil
Copy link
Member

You'll need to pass the transfromers model loaded with device_map (distributedd).

@RunTian1 RunTian1 closed this as completed Aug 8, 2024
@RunTian1
Copy link
Author

RunTian1 commented Aug 8, 2024

thank you! Problem solved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants