Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wanda #1834

Merged
merged 17 commits into from
Dec 28, 2023
Merged

Wanda #1834

merged 17 commits into from
Dec 28, 2023

Conversation

rahul-tuli
Copy link
Member

@rahul-tuli rahul-tuli commented Nov 15, 2023

Initial Implementation of Wanda; Updated to use memory tricks similar to OBCQ LayerCompressor and SparseGPT;
Research Paper Link: https://arxiv.org/abs/2306.11695

Smaller Test Recipe(targets just one layer):

# wanda_small_recipe.yaml

test_stage:
  pruning_modifiers:
    WandaPruningModifier:
      sparsity: 0.5
      targets: [
        "model.layers.0",
      ]
      leave_enabled: True
  • A smaller Test Command:
python src/sparseml/transformers/sparsification/obcq/obcq.py Xenov
a/llama2.c-stories15M c4 --recipe local/wanda/wanda_small_recipe.yaml --eval wikitext2 --nsamples 128 

2023-11-16 12:27:37 __main__     INFO     Running one_shot on device cuda:0
Repo card metadata block was not found. Setting CardData to empty.
/home/rahul/projects/.venv/lib/python3.11/site-packages/datasets/table.py:1421: FutureWarning: promote has been superseded by mode='default'.
  table = cls._concat_blocks(blocks, axis=0)
Token indices sequence length is longer than the specified maximum sequence length for this model (57579 > 2048). Running this sequence through the model will result in indexing errors
2023-11-16 12:27:55 sparseml.modifiers.pruning.wanda.pytorch INFO     
===== Compressing layer 1/1 to sparsity 0.5 =====
2023-11-16 12:27:55 sparseml.modifiers.pruning.wanda.utils.layer_compressor INFO     Compressing module self_attn.q_proj of layer 0
2023-11-16 12:27:56 sparseml.modifiers.pruning.wanda.utils.wrapped_gpt INFO     time 0.81
2023-11-16 12:27:56 sparseml.modifiers.pruning.wanda.utils.layer_compressor INFO     Compressing module self_attn.k_proj of layer 0
2023-11-16 12:27:56 sparseml.modifiers.pruning.wanda.utils.wrapped_gpt INFO     time 0.00
2023-11-16 12:27:56 sparseml.modifiers.pruning.wanda.utils.layer_compressor INFO     Compressing module self_attn.v_proj of layer 0
2023-11-16 12:27:56 sparseml.modifiers.pruning.wanda.utils.wrapped_gpt INFO     time 0.00
2023-11-16 12:27:56 sparseml.modifiers.pruning.wanda.utils.layer_compressor INFO     Compressing module self_attn.o_proj of layer 0
2023-11-16 12:27:56 sparseml.modifiers.pruning.wanda.utils.wrapped_gpt INFO     time 0.00
2023-11-16 12:27:57 sparseml.modifiers.pruning.wanda.utils.layer_compressor INFO     Compressing module mlp.gate_proj of layer 0
2023-11-16 12:27:57 sparseml.modifiers.pruning.wanda.utils.wrapped_gpt INFO     time 0.00
2023-11-16 12:27:57 sparseml.modifiers.pruning.wanda.utils.layer_compressor INFO     Compressing module mlp.up_proj of layer 0
2023-11-16 12:27:57 sparseml.modifiers.pruning.wanda.utils.wrapped_gpt INFO     time 0.00
2023-11-16 12:27:57 sparseml.modifiers.pruning.wanda.utils.layer_compressor INFO     Compressing module mlp.down_proj of layer 0
2023-11-16 12:27:57 sparseml.modifiers.pruning.wanda.utils.wrapped_gpt INFO     time 0.00
Token indices sequence length is longer than the specified maximum sequence length for this model (341469 > 2048). Running this sequence through the model will result in indexing errors
2023-11-16 12:30:18 sparseml.modifiers.obcq.utils.helpers INFO     Evaluating perplexity...
  • Test command for llama-7B (longer)
python src/sparseml/transformers/sparsification/obcq/obcq.py /home/rahul/llama/training c4 --recipe local/wanda/wanda_small_recipe.yaml --eval wikitext2
 --nsamples 128 --precision full 
2023-12-18 12:15:04 __main__     INFO     Running one_shot on device cuda:0
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.04it/s]
...
...
2023-12-18 12:16:06 sparseml.core.recipe.recipe INFO     Loading recipe from file /home/rahul/llama/training/recipe.yaml
2023-12-18 12:16:06 sparseml.pytorch.model_load.helpers INFO     Applied a staged recipe with 0 stages to the model at /home/rahul/llama/training
2023-12-18 12:16:06 sparseml.core.recipe.recipe INFO     Loading recipe from file local/wanda/wanda_small_recipe.yaml
2023-12-18 12:16:08 sparseml.modifiers.pruning.wanda.pytorch INFO     
===== Compressing layer 1/1 to sparsity 0.5 =====
2023-12-18 12:16:09 sparseml.modifiers.utils.layer_compressor INFO     Compressing module self_attn.q_proj of layer 0
2023-12-18 12:16:11 sparseml.modifiers.pruning.wanda.utils.module_compressor INFO     time 2.59
2023-12-18 12:16:11 sparseml.modifiers.utils.layer_compressor INFO     Compressing module self_attn.k_proj of layer 0
2023-12-18 12:16:14 sparseml.modifiers.pruning.wanda.utils.module_compressor INFO     time 2.43
...
...
2023-12-18 12:16:24 sparseml.modifiers.utils.layer_compressor INFO     Compressing module mlp.down_proj of layer 0
2023-12-18 12:16:26 sparseml.modifiers.pruning.wanda.utils.module_compressor INFO     time 2.47
2023-12-18 12:18:43 sparseml.modifiers.obcq.utils.helpers INFO     Evaluating perplexity...
2023-12-18 12:19:51 sparseml.modifiers.obcq.utils.helpers INFO     tensor(5.0572, device='cuda:0')
2023-12-18 12:20:58 sparseml.modifiers.obcq.utils.helpers INFO     tensor(5.5654, device='cuda:0')
...
...
2023-12-18 12:30:43 sparseml.modifiers.obcq.utils.helpers INFO     Perplexity: 5.141213
  • Also ran OBCQ on llama-7B to ascertain that too works as expected, the command kickstarted fine; waiting for completion, will publish results as soon as complete

The requested changes have been added in a stacked PR fashion; Use the itemized list below to navigate:

Major changes include:

  • Adding a WandaPruningModifier
  • Wanda is a general case of OBCQ; leverage this fact to share common code b/w the algorithms
    • Extract a Base class LayerCompressor, leave specific implementations to WandaLayerCompressor/OBCQLayerCompressor
    • Extract a Base Class ModuleCompressor from SparseGPT and WandaModuleCompressor; leave specific implementations to the algorithm
    • Make SparseGPT modifier inherit from WandaModifier and specialize it for running OBCQ algorithm

Copy link
Contributor

@Satrat Satrat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot of shared code with the SpareGPTModifier, if this needs to get pushed in quickly to support research then sure ship it! But if not, I'm uncomfortable pushing this since theres so much duplicated code across multiple files. Rather than moving the shared code around, could the Wanda modifier just inherit from SparseGPT and overwrite functions as neccesary? We do something similar for the SmoothQuant and LogQuant modifiers

* Define GPT contract

* rename tmp -> batch_size

* Define LayerCompressor Contract

* Rename gpt_helpers to gpts
Fix some docstrings

* add named argument to function call

* Wanda/OBCQ refactor

* propagate target-ids

* Address review comments from
* #1885
* #1886
Copy link
Contributor

@Satrat Satrat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! One nit: I'm thinking we should move the obcq folder to pruning/obcq?

@rahul-tuli
Copy link
Member Author

LGTM! One nit: I'm thinking we should move the obcq folder to pruning/obcq?

I agree let's do that as a follow up PR

@rahul-tuli rahul-tuli merged commit 239db82 into main Dec 28, 2023
12 checks passed
@rahul-tuli rahul-tuli deleted the wanda branch December 28, 2023 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants