[GPTQQuantizer] How to use multi-GPU for GPTQQuantizer? #1981

RunTian1 · 2024-08-05T07:58:11Z

System Info

hello：
I encountered an out-of-memory error while attempting to quantize a model using GPTQQuantizer. The error seems to be related to the large size of the model weights. Below is the quantization code I used:

from optimum.gptq import GPTQQuantizer

quantizer = GPTQQuantizer(
    bits=4,
    dataset='wikitext2',
    block_name_to_quantize=decoder.layers,
    disable_exllama=False,
    damp_percent=0.1,
    group_size=128
)

The error message I received is as follows:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 784.00 MiB. GPU 0 has a total capacty of 10.90 GiB of which 770.44 MiB is free. Including non-PyTorch memory

Environment:
· Transformers version: 4.43.2
· Optimum version: 1.21.2
· GPU model and memory: 11GiB * 2
· CUDA version: 12.4
Question:How to use multi-GPU for GPTQQuantizer? thank you!

Who can help?

@kashif @srush @danieldk @mausch @dmaniloff How to use multi-GPU for GPTQQuantizer?

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

from optimum.gptq import GPTQQuantizer

quantizer = GPTQQuantizer(
    bits=4,
    dataset='wikitext2',
    block_name_to_quantize=decoder.layers,
    disable_exllama=False,
    damp_percent=0.1,
    group_size=128
)

Expected behavior

use multi-GPU for GPTQQuantizer?

The text was updated successfully, but these errors were encountered:

IlyasMoutawwakil · 2024-08-05T14:32:27Z

You'll need to pass the transfromers model loaded with device_map (distributedd).

RunTian1 · 2024-08-08T02:19:17Z

thank you! Problem solved

RunTian1 added the bug Something isn't working label Aug 5, 2024

RunTian1 closed this as completed Aug 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GPTQQuantizer] How to use multi-GPU for GPTQQuantizer? #1981

[GPTQQuantizer] How to use multi-GPU for GPTQQuantizer? #1981

RunTian1 commented Aug 5, 2024 •

edited

Loading

IlyasMoutawwakil commented Aug 5, 2024

RunTian1 commented Aug 8, 2024

[GPTQQuantizer] How to use multi-GPU for GPTQQuantizer? #1981

[GPTQQuantizer] How to use multi-GPU for GPTQQuantizer? #1981

Comments

RunTian1 commented Aug 5, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction (minimal, reproducible, runnable)

Expected behavior

IlyasMoutawwakil commented Aug 5, 2024

RunTian1 commented Aug 8, 2024

RunTian1 commented Aug 5, 2024 •

edited

Loading