-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPTQ UX config groups support #2273
Conversation
Do we still need the ignore list if we have a targets list - would be great if we didn't need architecture specific ignores like Side note: |
Yeah we can safely delete the ignore list, we only need to add a module to the ignore list if it would otherwise we covered by one of the config groups. The vLLMQuantizationModifier vs regular QuantizationModifier is just to differentiate between the old and new quantization frameworks for now. We're going to get rid of the old framework soon, and at that point can rename the modifier. But if the name itself is an immediate problem sure we can change it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small recipe nitpicks, also can we add a unit test for this new feature
53b96e5
to
4409730
Compare
* test * Preserve weight sparsity if greater than threshold * Add argument to preserve sparsity mask in SPARSEGPT * fix case when mask is none * Add test to check mask_structure - initial mask structure should be preserved b/w consecutive runs; added test to check this * Update tensor_follows_mask_structure to check for atleast n zeros --------- Co-authored-by: Sara Adkins <sara@neuralmagic.com>
440661b
to
34482c0
Compare
* Split WandaPruningModifier and SparseGPTModifier Make sparsegpt not inherit from wanda modifier Decouple SparseGPTModifierPyTorch from WandaPruningModifier Fix docstrings * Split SparseGPT and GPTQ modifiers (#2272) * Update OBCQ * Extract GPTQ Modifier * [GPTQ Modifier UX] Update tests to use GPTQModifier for obcq style quantization (#2294) * Update OBCQ * Extract GPTQ Modifier * Update test recipes * GPTQ UX config groups support (#2273) * Update OBCQ * Extract GPTQ Modifier * Update test recipes * Add config_groups support to GPTQModifier * mask_structure preservation test (#2284) * test * Preserve weight sparsity if greater than threshold * Add argument to preserve sparsity mask in SPARSEGPT * fix case when mask is none * Add test to check mask_structure - initial mask structure should be preserved b/w consecutive runs; added test to check this * Update tensor_follows_mask_structure to check for atleast n zeros --------- Co-authored-by: Sara Adkins <sara@neuralmagic.com> * PR comments --------- Co-authored-by: Sara Adkins <sara@neuralmagic.com> * Fix default case * Update test to use new vLLMQuantizationModifier * Style --------- Co-authored-by: Sara Adkins <sara@neuralmagic.com>
This PR enhances the user experience of the
GPTQModifier
by allowing it to directly accept quantization-related arguments, such asconfig_groups
. This change simplifies the configuration process, enabling users to specify a singleGPTQModifier
instead of combining both aQuantizationModifier
and aGPTQModifier
into a recipe.Key Changes
GPTQModifier
now accepts quantization-related arguments directly, facilitating easier and more direct configuration.Implementation Details
Under the hood, a
vLLMQuantizationModifier
is initialized with:config_groups
ignore
num_calibration_samples
disable_observer_epoch
Example Configurations
Old Configuration:
New Simplified Configuration:
End-to-End Script Example
Recipe:
Output
Command
STDOUT