[Bug Fix] Fix pruning for partially quantized models #1792

Satrat · 2023-10-25T22:01:46Z

If quantization was set to True in the SparseGPTModifier, only quantized layers would be compressed by the algorithm. This fix cleans up the code quite a bit by using the existing get_prunable_layers utility function to search for both quantized and unquantized prunable layers. Now if a layer is ignored by the QuantizationModifier the SparseGPTModifier will still prune it.

Example Recipe:

test_stage:
  obcq_modifiers:
    QuantizationModifier:
      ignore:
        - LlamaRotaryEmbedding
        - LlamaRMSNorm
        - SiLUActivation
        - model.layers.1.mlp.down_proj
        - model.layers.5.mlp.down_proj
      post_oneshot_calibration: True
      scheme_overrides:
        Embedding:
          input_activations: null
          weights:
            num_bits: 8
            symmetric: False
    SparseGPTModifier:
      sparsity: 0.5
      block_size: 128
      sequential_update: False
      quantize: True
      percdamp: 0.01
      prunen: 0
      prunem: 0
      targets: [
        "model.layers.0",
        "model.layers.1",
        "model.layers.2",
        "model.layers.3",
        "model.layers.4",
        "model.layers.5"
      ]
      target_ids: ["attention_mask", "position_ids"]

Testing

python src/sparseml/transformers/sparsification/obcq/obcq.py Xenova/llama2.c-stories15M open_platypus --recipe tiny_recipe.yaml

Even though model.layer.5.down_proj is ignored during quantization, it is still pruned by SparseGPT

===== Compressing layer 5/5 =====
2023-10-25 18:00:40 sparseml.modifiers.obcq.utils.layer_compressor INFO     Compressing self_attn.q_proj.module...
2023-10-25 18:00:40 sparseml.modifiers.obcq.utils.sparsegpt INFO     time 0.07
2023-10-25 18:00:40 sparseml.modifiers.obcq.utils.sparsegpt INFO     error 1215.84
2023-10-25 18:00:40 sparseml.modifiers.obcq.utils.layer_compressor INFO     Compressing self_attn.k_proj.module...
2023-10-25 18:00:40 sparseml.modifiers.obcq.utils.sparsegpt INFO     time 0.07
2023-10-25 18:00:40 sparseml.modifiers.obcq.utils.sparsegpt INFO     error 1160.68
2023-10-25 18:00:40 sparseml.modifiers.obcq.utils.layer_compressor INFO     Compressing self_attn.v_proj.module...
2023-10-25 18:00:40 sparseml.modifiers.obcq.utils.sparsegpt INFO     time 0.07
2023-10-25 18:00:40 sparseml.modifiers.obcq.utils.sparsegpt INFO     error 1167.86
2023-10-25 18:00:40 sparseml.modifiers.obcq.utils.layer_compressor INFO     Compressing self_attn.o_proj.module...
2023-10-25 18:00:40 sparseml.modifiers.obcq.utils.sparsegpt INFO     time 0.07
2023-10-25 18:00:40 sparseml.modifiers.obcq.utils.sparsegpt INFO     error 48.98
2023-10-25 18:00:40 sparseml.modifiers.obcq.utils.layer_compressor INFO     Compressing mlp.gate_proj.module...
2023-10-25 18:00:40 sparseml.modifiers.obcq.utils.sparsegpt INFO     time 0.07
2023-10-25 18:00:40 sparseml.modifiers.obcq.utils.sparsegpt INFO     error 1703.80
2023-10-25 18:00:40 sparseml.modifiers.obcq.utils.layer_compressor INFO     Compressing mlp.up_proj.module...
2023-10-25 18:00:40 sparseml.modifiers.obcq.utils.sparsegpt INFO     time 0.07
2023-10-25 18:00:40 sparseml.modifiers.obcq.utils.sparsegpt INFO     error 1946.11
2023-10-25 18:00:40 sparseml.modifiers.obcq.utils.layer_compressor INFO     Compressing mlp.down_proj...
2023-10-25 18:00:41 sparseml.modifiers.obcq.utils.sparsegpt INFO     time 0.15
2023-10-25 18:00:41 sparseml.modifiers.obcq.utils.sparsegpt INFO     error 128.81

rahul-tuli

Let's go!

mgoin

works for me, thanks!

bug fix for mixed quantization

ad4159b

Satrat requested review from bfineran, mgoin, dsikka, rahul-tuli and dbogunowicz October 25, 2023 22:02

rahul-tuli approved these changes Oct 25, 2023

View reviewed changes

mgoin approved these changes Oct 26, 2023

View reviewed changes

mgoin merged commit 63e7740 into main Oct 27, 2023
10 of 11 checks passed

mgoin deleted the mixed_quant_bug_fix branch October 27, 2023 15:24

bfineran pushed a commit that referenced this pull request Nov 16, 2023

bug fix for mixed quantization (#1792)

df95c2c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug Fix] Fix pruning for partially quantized models #1792

[Bug Fix] Fix pruning for partially quantized models #1792

Satrat commented Oct 25, 2023

rahul-tuli left a comment

mgoin left a comment

[Bug Fix] Fix pruning for partially quantized models #1792

[Bug Fix] Fix pruning for partially quantized models #1792

Conversation

Satrat commented Oct 25, 2023

Testing

rahul-tuli left a comment

Choose a reason for hiding this comment

mgoin left a comment

Choose a reason for hiding this comment