GPTQ UX config groups support #2273

rahul-tuli · 2024-05-09T21:10:20Z

This PR enhances the user experience of the GPTQModifier by allowing it to directly accept quantization-related arguments, such as config_groups. This change simplifies the configuration process, enabling users to specify a single GPTQModifier instead of combining both a QuantizationModifier and a GPTQModifier into a recipe.

Key Changes

Direct Argument Acceptance: GPTQModifier now accepts quantization-related arguments directly, facilitating easier and more direct configuration.
Enhanced Control: This update exposes more fine-grained control of quantization settings to the users, improving usability and customization.

Implementation Details

Under the hood, a vLLMQuantizationModifier is initialized with:

config_groups
ignore
num_calibration_samples
disable_observer_epoch

Example Configurations

Old Configuration:

# Example of the previous complex setup
test_stage:
    obcq_modifiers:
      vLLMQuantizationModifier:
        ignore: [...]
        config_groups:
            group_0:
                targets: ["Linear"]
                # Further settings...
      GPTQModifier:
          # Additional settings...

New Simplified Configuration:

# Simplified setup with integrated quantization settings
test_stage:
    obcq_modifiers:
      GPTQModifier:
          ignore: [...]
          config_groups:
            group_0:
                targets: ["Linear"]
                # Further settings...
          # Additional simplified settings...

End-to-End Script Example

Recipe:

#  local/feature/gptq_ux/recipes/recipe_config_groups.yaml

test_stage:
    obcq_modifiers:
      GPTQModifier:
          ignore: ["LlamaRotaryEmbedding", "LlamaRMSNorm", "SiLUActivation", "MatMulLeftInput_QK", "MatMulRightInput_QK", "MatMulLeftInput_PV", "MatMulRightInput_PV", "MatMulOutput_QK", "MatMulOutput_PV", "lm_head", "Embedding"]
          sequential_update: True
          dampening_frac: 0.001
          block_size: 128
          config_groups:
            group_0:
                targets: ["Linear"]
                input_activations: null
                output_activations: null
                weights:
                    num_bits: 8
                    type: "int"
                    symmetric: true
                    strategy: "tensor"
                    group_size: 128

# local/feature/get_quant_model.py 

from pathlib import Path
from sparseml.transformers import SparseAutoModelForCausalLM, oneshot
import argparse
from datetime import datetime

tinyllama_stub = "TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T"
tiny_random_llama_stub = "HuggingFaceH4/tiny-random-LlamaForCausalLM"

parser = argparse.ArgumentParser(description="Get Quant Model")
parser.add_argument('--recipe', default="/root/projects/sparseml/local/feature/recipe.yaml", help='Path to the recipe')
parser.add_argument('--model_stub', default=tinyllama_stub, help='Model stub')
parser.add_argument('--dataset', default="open_platypus", help='Dataset name')
parser.add_argument('--max_seq_length', type=int, default=512, help='Maximum sequence length')
parser.add_argument('--output_dir', default=None, help='Output directory')
parser.add_argument('--num_calibration_samples', type=int, default=512, help='Number of calibration samples')
parser.add_argument('--overwrite_output_dir', action='store_true', help='Overwrite output directory')
parser.add_argument('--small', action='store_true', help='Use a small model')
args = parser.parse_args()

def get_save_dir_name(model_stub):
        dir_name = f"{model_stub.split('/')[-1]}_{datetime.now().strftime('%Y-%m-%d-%H-%M-%S')}"
        return str(Path("output") / dir_name)

recipe = args.recipe
model_stub = tiny_random_llama_stub if args.small else args.model_stub 
dataset = args.dataset
max_seq_length = args.max_seq_length
output_dir = args.output_dir or get_save_dir_name(model_stub)
num_calibration_samples = args.num_calibration_samples
device = "cuda"

oneshot(
        model=model_stub,
        dataset=dataset,
        overwrite_output_dir=True,
        output_dir=output_dir,
        max_seq_length=max_seq_length,
        num_calibration_samples=num_calibration_samples,
        recipe=recipe,
        oneshot_device=device,
)


# try reloading the model

model_new = SparseAutoModelForCausalLM.from_pretrained(output_dir)
print("Model reloaded successfully!")

Output

Command

python local/feature/get_quant_model.py --small \
    --recipe local/feature/gptq_ux/recipes/recipe_config_groups.yaml

STDOUT

# Output from running the example command
2024-05-09 20:45:40 sparseml.transformers.finetune.session_mixin INFO  ...
Model reloaded successfully!

mgoin · 2024-05-13T15:17:18Z

Do we still need the ignore list if we have a targets list - would be great if we didn't need architecture specific ignores like LlamaRMSNorm?

Side note: vLLMQuantizationModifier is a dangerous name to keep around, I would prefer if we didn't keep this as a modifier

Satrat · 2024-05-15T13:24:36Z

Do we still need the ignore list if we have a targets list - would be great if we didn't need architecture specific ignores like LlamaRMSNorm?

Side note: vLLMQuantizationModifier is a dangerous name to keep around, I would prefer if we didn't keep this as a modifier

Yeah we can safely delete the ignore list, we only need to add a module to the ignore list if it would otherwise we covered by one of the config groups.

The vLLMQuantizationModifier vs regular QuantizationModifier is just to differentiate between the old and new quantization frameworks for now. We're going to get rid of the old framework soon, and at that point can rename the modifier. But if the name itself is an immediate problem sure we can change it

Satrat

Some small recipe nitpicks, also can we add a unit test for this new feature

src/sparseml/modifiers/quantization/gptq/base.py

src/sparseml/modifiers/quantization/gptq/pytorch.py

* test * Preserve weight sparsity if greater than threshold * Add argument to preserve sparsity mask in SPARSEGPT * fix case when mask is none * Add test to check mask_structure - initial mask structure should be preserved b/w consecutive runs; added test to check this * Update tensor_follows_mask_structure to check for atleast n zeros --------- Co-authored-by: Sara Adkins <sara@neuralmagic.com>

* Split WandaPruningModifier and SparseGPTModifier Make sparsegpt not inherit from wanda modifier Decouple SparseGPTModifierPyTorch from WandaPruningModifier Fix docstrings * Split SparseGPT and GPTQ modifiers (#2272) * Update OBCQ * Extract GPTQ Modifier * [GPTQ Modifier UX] Update tests to use GPTQModifier for obcq style quantization (#2294) * Update OBCQ * Extract GPTQ Modifier * Update test recipes * GPTQ UX config groups support (#2273) * Update OBCQ * Extract GPTQ Modifier * Update test recipes * Add config_groups support to GPTQModifier * mask_structure preservation test (#2284) * test * Preserve weight sparsity if greater than threshold * Add argument to preserve sparsity mask in SPARSEGPT * fix case when mask is none * Add test to check mask_structure - initial mask structure should be preserved b/w consecutive runs; added test to check this * Update tensor_follows_mask_structure to check for atleast n zeros --------- Co-authored-by: Sara Adkins <sara@neuralmagic.com> * PR comments --------- Co-authored-by: Sara Adkins <sara@neuralmagic.com> * Fix default case * Update test to use new vLLMQuantizationModifier * Style --------- Co-authored-by: Sara Adkins <sara@neuralmagic.com>

rahul-tuli requested review from Satrat, bfineran, dsikka, horheynm and dbogunowicz May 9, 2024 21:11

rahul-tuli self-assigned this May 9, 2024

rahul-tuli mentioned this pull request May 9, 2024

[Feature Branch] Quant modifier UX #2263

Merged

7 tasks

bfineran approved these changes May 13, 2024

View reviewed changes

Satrat requested changes May 15, 2024

View reviewed changes

src/sparseml/modifiers/quantization/gptq/base.py Outdated Show resolved Hide resolved

src/sparseml/modifiers/quantization/gptq/pytorch.py Outdated Show resolved Hide resolved

rahul-tuli added 2 commits May 20, 2024 13:55

Update OBCQ

dd8241b

Extract GPTQ Modifier

4409730

rahul-tuli force-pushed the create-gptq-modifier branch from 53b96e5 to 4409730 Compare May 20, 2024 14:17

rahul-tuli and others added 3 commits May 20, 2024 17:53

Update test recipes

080c3b1

Add config_groups support to GPTQModifier

321328c

rahul-tuli force-pushed the gptq-ux-config-groups branch from 440661b to 34482c0 Compare May 20, 2024 18:20

PR comments

3135595

rahul-tuli changed the base branch from create-gptq-modifier to update-tests May 20, 2024 18:22

bfineran approved these changes May 20, 2024

View reviewed changes

Base automatically changed from update-tests to quant-modifier-ux May 20, 2024 18:56

Merge branch 'quant-modifier-ux' into gptq-ux-config-groups

b21b7cd

rahul-tuli merged commit 93300b1 into quant-modifier-ux May 20, 2024

rahul-tuli deleted the gptq-ux-config-groups branch May 20, 2024 19:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPTQ UX config groups support #2273

GPTQ UX config groups support #2273

rahul-tuli commented May 9, 2024

mgoin commented May 13, 2024

Satrat commented May 15, 2024

Satrat left a comment

GPTQ UX config groups support #2273

GPTQ UX config groups support #2273

Conversation

rahul-tuli commented May 9, 2024

Key Changes

Implementation Details

Example Configurations

End-to-End Script Example

Output

mgoin commented May 13, 2024

Satrat commented May 15, 2024

Satrat left a comment

Choose a reason for hiding this comment