You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
AWQ: 30%|███ | 18/60 [4:08:27<9:39:44, 828.20s/it]
Traceback (most recent call last):
File "/home/xiaoi/dq/fubin/alignment/Quantization.py", line 14, in
model.quantize(tokenizer, quant_config=quant_config)
File "/opt/nlp/anaconda3/envs/moe_new_2/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/xiaoi/dq/download/AutoAWQ-main/awq/models/base.py", line 230, in quantize
self.quantizer.quantize()
File "/home/xiaoi/dq/download/AutoAWQ-main/awq/quantize/quantizer.py", line 166, in quantize
scales_list = [
File "/home/xiaoi/dq/download/AutoAWQ-main/awq/quantize/quantizer.py", line 167, in
self._search_best_scale(self.modules[i], **layer)
File "/opt/nlp/anaconda3/envs/moe_new_2/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/xiaoi/dq/download/AutoAWQ-main/awq/quantize/quantizer.py", line 332, in _search_best_scale
best_scales = self._compute_best_scale(
File "/home/xiaoi/dq/download/AutoAWQ-main/awq/quantize/quantizer.py", line 413, in _compute_best_scale
raise Exception
Exception
`
maybe there is a bug in AwqQuantizer class?
The text was updated successfully, but these errors were encountered:
Hi @BinFuPKU, thanks for raising the issue. I will need to further investigate what causes this, but I can see it will not be easy to debug since the model is so large. Do you have any smaller models that you have observed NaN values in loss on?
For large parameter LLMs, increasing the length of individual text in the calibration dataset can help avoid this problem, but it is still recommended to adjust the calibration data to fit the model.
In the start stage, it runs well!
It raises an exception when the quantilization process reaches 30% (quantilizing DeepSeek-V2-chat with AutoAWQ about 1~2 hours)
We can find the loss increases to NAN in this sub-iteration (18/60), which is abnormal.
`
AWQ: 30%|███ | 18/60 [4:07:51<10:14:05, 877.27s/it]
...
Computing Loss (loss: nan): 0%| | 0/1 [00:00<?, ?it/s]�[A�[A�[A
Grid Search (Best: -1): 90%|█████████ | 18/20 [00:19<00:02, 1.02s/it]�[A�[A
Grid Search (Best: -1): 95%|█████████▌| 19/20 [00:19<00:01, 1.02s/it]�[A�[A
Computing Loss: 0%| | 0/1 [00:00<?, ?it/s]�[A�[A�[A
Computing Loss (loss: nan): 0%| | 0/1 [00:00<?, ?it/s]�[A�[A�[A
Grid Search (Best: -1): 95%|█████████▌| 19/20 [00:20<00:01, 1.02s/it]�[A�[A
Grid Search (Best: -1): 100%|██████████| 20/20 [00:20<00:00, 1.02s/it]�[A�[A
AWQ: 30%|███ | 18/60 [4:08:27<9:39:44, 828.20s/it]
Traceback (most recent call last):
File "/home/xiaoi/dq/fubin/alignment/Quantization.py", line 14, in
model.quantize(tokenizer, quant_config=quant_config)
File "/opt/nlp/anaconda3/envs/moe_new_2/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/xiaoi/dq/download/AutoAWQ-main/awq/models/base.py", line 230, in quantize
self.quantizer.quantize()
File "/home/xiaoi/dq/download/AutoAWQ-main/awq/quantize/quantizer.py", line 166, in quantize
scales_list = [
File "/home/xiaoi/dq/download/AutoAWQ-main/awq/quantize/quantizer.py", line 167, in
self._search_best_scale(self.modules[i], **layer)
File "/opt/nlp/anaconda3/envs/moe_new_2/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/xiaoi/dq/download/AutoAWQ-main/awq/quantize/quantizer.py", line 332, in _search_best_scale
best_scales = self._compute_best_scale(
File "/home/xiaoi/dq/download/AutoAWQ-main/awq/quantize/quantizer.py", line 413, in _compute_best_scale
raise Exception
Exception
`
maybe there is a bug in AwqQuantizer class?
The text was updated successfully, but these errors were encountered: