raise Exception (the loss increases to NAN ) when quantilizing DeepSeek-V2-chat using the new version of AutoAWQ in the sub-iteration (18/60) #535

BinFuPKU · 2024-07-09T01:52:21Z

In the start stage, it runs well!

It raises an exception when the quantilization process reaches 30% (quantilizing DeepSeek-V2-chat with AutoAWQ about 1~2 hours)

We can find the loss increases to NAN in this sub-iteration (18/60), which is abnormal.

`
AWQ: 30%|███ | 18/60 [4:07:51<10:14:05, 877.27s/it]
...
Computing Loss (loss: nan): 0%| | 0/1 [00:00<?, ?it/s]�[A�[A�[A

                                                             �[A�[A�[A

Grid Search (Best: -1): 90%|█████████ | 18/20 [00:19<00:02, 1.02s/it]�[A�[A

Grid Search (Best: -1): 95%|█████████▌| 19/20 [00:19<00:01, 1.02s/it]�[A�[A

Computing Loss: 0%| | 0/1 [00:00<?, ?it/s]�[A�[A�[A

Computing Loss (loss: nan): 0%| | 0/1 [00:00<?, ?it/s]�[A�[A�[A

                                                             �[A�[A�[A

Grid Search (Best: -1): 95%|█████████▌| 19/20 [00:20<00:01, 1.02s/it]�[A�[A

Grid Search (Best: -1): 100%|██████████| 20/20 [00:20<00:00, 1.02s/it]�[A�[A

                                                                   �[A�[A

AWQ: 30%|███ | 18/60 [4:08:27<9:39:44, 828.20s/it]
Traceback (most recent call last):
File "/home/xiaoi/dq/fubin/alignment/Quantization.py", line 14, in
model.quantize(tokenizer, quant_config=quant_config)
File "/opt/nlp/anaconda3/envs/moe_new_2/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/xiaoi/dq/download/AutoAWQ-main/awq/models/base.py", line 230, in quantize
self.quantizer.quantize()
File "/home/xiaoi/dq/download/AutoAWQ-main/awq/quantize/quantizer.py", line 166, in quantize
scales_list = [
File "/home/xiaoi/dq/download/AutoAWQ-main/awq/quantize/quantizer.py", line 167, in
self._search_best_scale(self.modules[i], **layer)
File "/opt/nlp/anaconda3/envs/moe_new_2/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/xiaoi/dq/download/AutoAWQ-main/awq/quantize/quantizer.py", line 332, in _search_best_scale
best_scales = self._compute_best_scale(
File "/home/xiaoi/dq/download/AutoAWQ-main/awq/quantize/quantizer.py", line 413, in _compute_best_scale
raise Exception
Exception
`

maybe there is a bug in AwqQuantizer class?

The text was updated successfully, but these errors were encountered:

casper-hansen · 2024-07-10T09:16:37Z

Hi @BinFuPKU, thanks for raising the issue. I will need to further investigate what causes this, but I can see it will not be easy to debug since the model is so large. Do you have any smaller models that you have observed NaN values in loss on?

WanBenLe · 2024-07-10T10:13:40Z

For large parameter LLMs, increasing the length of individual text in the calibration dataset can help avoid this problem, but it is still recommended to adjust the calibration data to fit the model.

Kk1984up · 2024-07-30T02:28:14Z

i got the same issue when quantiliziing qwen2-7b-chat using version 0.2.3 autoawq。how to fix it ?

casper-hansen · 2024-07-30T06:31:19Z

@Kk1984up try upgrading to the newest version

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

raise Exception (the loss increases to NAN ) when quantilizing DeepSeek-V2-chat using the new version of AutoAWQ in the sub-iteration (18/60) #535

raise Exception (the loss increases to NAN ) when quantilizing DeepSeek-V2-chat using the new version of AutoAWQ in the sub-iteration (18/60) #535

BinFuPKU commented Jul 9, 2024 •

edited

Loading

casper-hansen commented Jul 10, 2024

WanBenLe commented Jul 10, 2024

Kk1984up commented Jul 30, 2024

casper-hansen commented Jul 30, 2024

raise Exception (the loss increases to NAN ) when quantilizing DeepSeek-V2-chat using the new version of AutoAWQ in the sub-iteration (18/60) #535

raise Exception (the loss increases to NAN ) when quantilizing DeepSeek-V2-chat using the new version of AutoAWQ in the sub-iteration (18/60) #535

Comments

BinFuPKU commented Jul 9, 2024 • edited Loading

casper-hansen commented Jul 10, 2024

WanBenLe commented Jul 10, 2024

Kk1984up commented Jul 30, 2024

casper-hansen commented Jul 30, 2024

BinFuPKU commented Jul 9, 2024 •

edited

Loading