[Bug Fix] Fix pruning for partially quantized models #1792
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
If quantization was set to True in the SparseGPTModifier, only quantized layers would be compressed by the algorithm. This fix cleans up the code quite a bit by using the existing
get_prunable_layers
utility function to search for both quantized and unquantized prunable layers. Now if a layer is ignored by the QuantizationModifier the SparseGPTModifier will still prune it.Example Recipe:
Testing
python src/sparseml/transformers/sparsification/obcq/obcq.py Xenova/llama2.c-stories15M open_platypus --recipe tiny_recipe.yaml
Even though
model.layer.5.down_proj
is ignored during quantization, it is still pruned by SparseGPT