You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
hello:
I encountered an out-of-memory error while attempting to quantize a model using GPTQQuantizer. The error seems to be related to the large size of the model weights. Below is the quantization code I used:
from optimum.gptq import GPTQQuantizer
quantizer = GPTQQuantizer(
bits=4,
dataset='wikitext2',
block_name_to_quantize=decoder.layers,
disable_exllama=False,
damp_percent=0.1,
group_size=128
)
The error message I received is as follows:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 784.00 MiB. GPU 0 has a total capacty of 10.90 GiB of which 770.44 MiB is free. Including non-PyTorch memory
Environment:
· Transformers version: 4.43.2
· Optimum version: 1.21.2
· GPU model and memory: 11GiB * 2
· CUDA version: 12.4
Question:How to use multi-GPU for GPTQQuantizer? thank you!
System Info
Who can help?
@kashif @srush @danieldk @mausch @dmaniloff How to use multi-GPU for GPTQQuantizer?
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction (minimal, reproducible, runnable)
from optimum.gptq import GPTQQuantizer
Expected behavior
use multi-GPU for GPTQQuantizer?
The text was updated successfully, but these errors were encountered: