Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for NaNs in Smooth Quant #1872

Merged
merged 4 commits into from
Dec 1, 2023
Merged

Fix for NaNs in Smooth Quant #1872

merged 4 commits into from
Dec 1, 2023

Conversation

Satrat
Copy link
Contributor

@Satrat Satrat commented Dec 1, 2023

Issue first noticed on teknium/OpenHermes-2.5-Mistral-7B. When calculating the activation scales, its possible to get a scale of 0, which causes a NaN weight that errors out when trying to run the forward pass during quantization calibration.

The fix is to set a minimum scale of 1e-5 to avoid a divide by 0.

Also adding a seqlen argument to the OBCQ script, using the max sequence length Mistral was running out of memory during the perplexity eval.

See slack thread for more info on bug: https://neuralmagic.slack.com/archives/C04SRPGT5MW/p1700515011493959

Testing

src/sparseml/transformers/sparsification/obcq/obcq.py teknium/OpenHermes-2.5-Mistral-7B open_platypus --recipe recipe_mistral.yaml --precision float16 --seqlen 512 --eval wikitext2

Runs to completion now, previously failed with:

assert min_val <= max_val, "min {} should be less than max {}".format(
AssertionError: min nan should be less than max nan

recipe_mistral.yaml

test_stage:
  obcq_modifiers:
    LogarithmicEqualizationModifier:
      mappings: [
        [["re:.*q_proj", "re:.*k_proj", "re:.*v_proj"], "re:.*input_layernorm"],
        [["re:.*gate_proj", "re:.*up_proj"], "re:.*post_attention_layernorm"]
      ]
    QuantizationModifier:
      ignore:
        # These operations don't make sense to quantize
        - MistralRotaryEmbedding
        - MistralRMSNorm
        - SiLUActivation
        # Skip quantizing the BMMs
        # - QuantizableMatMul
        # Skip quantizing the layers with the most sensitive activations
        - model.layers.1.mlp.down_proj
        - model.layers.31.mlp.down_proj
        - model.layers.30.mlp.down_proj
        - model.layers.30.mlp.gate_proj
        - model.layers.30.mlp.up_proj
      post_oneshot_calibration: true
      scheme_overrides:
        Embedding:
          input_activations: null
          weights:
            num_bits: 8
            symmetric: false
    SparseGPTModifier:
      sparsity: 0.5
      block_size: 128
      sequential_update: true
      quantize: true
      percdamp: 0.01
      mask_structure: "0:0"
      targets: ["re:model.layers.\\d*$"]

Perplexity results:

2023-12-01 16:07:27 sparseml.modifiers.obcq.utils.helpers INFO     Evaluating perplexity...
2023-12-01 16:07:34 sparseml.modifiers.obcq.utils.helpers INFO     tensor(16.5364, device='cuda:4')
2023-12-01 16:07:41 sparseml.modifiers.obcq.utils.helpers INFO     tensor(19.9614, device='cuda:4')
2023-12-01 16:07:49 sparseml.modifiers.obcq.utils.helpers INFO     tensor(17.2977, device='cuda:4')
2023-12-01 16:07:56 sparseml.modifiers.obcq.utils.helpers INFO     tensor(14.8696, device='cuda:4')
2023-12-01 16:08:04 sparseml.modifiers.obcq.utils.helpers INFO     tensor(15.0391, device='cuda:4')
2023-12-01 16:08:11 sparseml.modifiers.obcq.utils.helpers INFO     tensor(15.0188, device='cuda:4')

@Satrat Satrat marked this pull request as ready for review December 1, 2023 15:05
@Satrat Satrat requested a review from anmarques December 1, 2023 16:15
@mgoin mgoin merged commit c722fc3 into main Dec 1, 2023
12 checks passed
@mgoin mgoin deleted the smooth_nan_fix branch December 1, 2023 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants