-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split SparseGPT and GPTQ modifiers #2272
Conversation
dfb3d7f
to
a55f50c
Compare
863b7a7
to
49cd9e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall structure looks good to me, just saw a couple small issues. Also, now that sparsegpt is broken out its not really obcq anywhere since there is no q. Could we rename all those folders from obcq -> sparsegpt?
4230fb7
to
d4d85ff
Compare
53b96e5
to
4409730
Compare
Absolutely, Good catch! I will add another PR once everything is in the feature branch, to prevent this PR from blowing up |
* Split WandaPruningModifier and SparseGPTModifier Make sparsegpt not inherit from wanda modifier Decouple SparseGPTModifierPyTorch from WandaPruningModifier Fix docstrings * Split SparseGPT and GPTQ modifiers (#2272) * Update OBCQ * Extract GPTQ Modifier * [GPTQ Modifier UX] Update tests to use GPTQModifier for obcq style quantization (#2294) * Update OBCQ * Extract GPTQ Modifier * Update test recipes * GPTQ UX config groups support (#2273) * Update OBCQ * Extract GPTQ Modifier * Update test recipes * Add config_groups support to GPTQModifier * mask_structure preservation test (#2284) * test * Preserve weight sparsity if greater than threshold * Add argument to preserve sparsity mask in SPARSEGPT * fix case when mask is none * Add test to check mask_structure - initial mask structure should be preserved b/w consecutive runs; added test to check this * Update tensor_follows_mask_structure to check for atleast n zeros --------- Co-authored-by: Sara Adkins <sara@neuralmagic.com> * PR comments --------- Co-authored-by: Sara Adkins <sara@neuralmagic.com> * Fix default case * Update test to use new vLLMQuantizationModifier * Style --------- Co-authored-by: Sara Adkins <sara@neuralmagic.com>
This PR introduces a structural change by separating concerns between quantization and sparsification. A new
GPTQModifier
is extracted from the existingSparseGPTModifier
. This ensures that each class now has a focused responsibility —GPTQModifier
manages quantization, whileSparseGPTModifier
is dedicated to sparsification.Changes
Extraction of
GPTQModifier
: Carved out fromSparseGPTModifier
, this new class handles all aspects related to quantization, including arguments specifically for quantization processes.Refinement of
SparseGPTModifier
andSparseGPTWrapper
: These have been updated to solely focus on sparsification. All quantization-related functionalities have been removed.Creation of
GPTQWrapper
: Implemented to apply quantization using OBQUpdate on Test Recipes: Modified the OBCQ test recipes to align with the new Modifiers.
Addition of Tests: Introduced new tests specifically for
GPTQModifier
to ensure functionality and stability.Test Plan