Wanda #1834

rahul-tuli · 2023-11-15T16:14:58Z

Initial Implementation of Wanda; Updated to use memory tricks similar to OBCQ LayerCompressor and SparseGPT;
Research Paper Link: https://arxiv.org/abs/2306.11695

Smaller Test Recipe(targets just one layer):

# wanda_small_recipe.yaml

test_stage:
  pruning_modifiers:
    WandaPruningModifier:
      sparsity: 0.5
      targets: [
        "model.layers.0",
      ]
      leave_enabled: True

A smaller Test Command:

python src/sparseml/transformers/sparsification/obcq/obcq.py Xenov
a/llama2.c-stories15M c4 --recipe local/wanda/wanda_small_recipe.yaml --eval wikitext2 --nsamples 128 

2023-11-16 12:27:37 __main__     INFO     Running one_shot on device cuda:0
Repo card metadata block was not found. Setting CardData to empty.
/home/rahul/projects/.venv/lib/python3.11/site-packages/datasets/table.py:1421: FutureWarning: promote has been superseded by mode='default'.
  table = cls._concat_blocks(blocks, axis=0)
Token indices sequence length is longer than the specified maximum sequence length for this model (57579 > 2048). Running this sequence through the model will result in indexing errors
2023-11-16 12:27:55 sparseml.modifiers.pruning.wanda.pytorch INFO     
===== Compressing layer 1/1 to sparsity 0.5 =====
2023-11-16 12:27:55 sparseml.modifiers.pruning.wanda.utils.layer_compressor INFO     Compressing module self_attn.q_proj of layer 0
2023-11-16 12:27:56 sparseml.modifiers.pruning.wanda.utils.wrapped_gpt INFO     time 0.81
2023-11-16 12:27:56 sparseml.modifiers.pruning.wanda.utils.layer_compressor INFO     Compressing module self_attn.k_proj of layer 0
2023-11-16 12:27:56 sparseml.modifiers.pruning.wanda.utils.wrapped_gpt INFO     time 0.00
2023-11-16 12:27:56 sparseml.modifiers.pruning.wanda.utils.layer_compressor INFO     Compressing module self_attn.v_proj of layer 0
2023-11-16 12:27:56 sparseml.modifiers.pruning.wanda.utils.wrapped_gpt INFO     time 0.00
2023-11-16 12:27:56 sparseml.modifiers.pruning.wanda.utils.layer_compressor INFO     Compressing module self_attn.o_proj of layer 0
2023-11-16 12:27:56 sparseml.modifiers.pruning.wanda.utils.wrapped_gpt INFO     time 0.00
2023-11-16 12:27:57 sparseml.modifiers.pruning.wanda.utils.layer_compressor INFO     Compressing module mlp.gate_proj of layer 0
2023-11-16 12:27:57 sparseml.modifiers.pruning.wanda.utils.wrapped_gpt INFO     time 0.00
2023-11-16 12:27:57 sparseml.modifiers.pruning.wanda.utils.layer_compressor INFO     Compressing module mlp.up_proj of layer 0
2023-11-16 12:27:57 sparseml.modifiers.pruning.wanda.utils.wrapped_gpt INFO     time 0.00
2023-11-16 12:27:57 sparseml.modifiers.pruning.wanda.utils.layer_compressor INFO     Compressing module mlp.down_proj of layer 0
2023-11-16 12:27:57 sparseml.modifiers.pruning.wanda.utils.wrapped_gpt INFO     time 0.00
Token indices sequence length is longer than the specified maximum sequence length for this model (341469 > 2048). Running this sequence through the model will result in indexing errors
2023-11-16 12:30:18 sparseml.modifiers.obcq.utils.helpers INFO     Evaluating perplexity...

Test command for llama-7B (longer)

python src/sparseml/transformers/sparsification/obcq/obcq.py /home/rahul/llama/training c4 --recipe local/wanda/wanda_small_recipe.yaml --eval wikitext2
 --nsamples 128 --precision full 
2023-12-18 12:15:04 __main__     INFO     Running one_shot on device cuda:0
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.04it/s]
...
...
2023-12-18 12:16:06 sparseml.core.recipe.recipe INFO     Loading recipe from file /home/rahul/llama/training/recipe.yaml
2023-12-18 12:16:06 sparseml.pytorch.model_load.helpers INFO     Applied a staged recipe with 0 stages to the model at /home/rahul/llama/training
2023-12-18 12:16:06 sparseml.core.recipe.recipe INFO     Loading recipe from file local/wanda/wanda_small_recipe.yaml
2023-12-18 12:16:08 sparseml.modifiers.pruning.wanda.pytorch INFO     
===== Compressing layer 1/1 to sparsity 0.5 =====
2023-12-18 12:16:09 sparseml.modifiers.utils.layer_compressor INFO     Compressing module self_attn.q_proj of layer 0
2023-12-18 12:16:11 sparseml.modifiers.pruning.wanda.utils.module_compressor INFO     time 2.59
2023-12-18 12:16:11 sparseml.modifiers.utils.layer_compressor INFO     Compressing module self_attn.k_proj of layer 0
2023-12-18 12:16:14 sparseml.modifiers.pruning.wanda.utils.module_compressor INFO     time 2.43
...
...
2023-12-18 12:16:24 sparseml.modifiers.utils.layer_compressor INFO     Compressing module mlp.down_proj of layer 0
2023-12-18 12:16:26 sparseml.modifiers.pruning.wanda.utils.module_compressor INFO     time 2.47
2023-12-18 12:18:43 sparseml.modifiers.obcq.utils.helpers INFO     Evaluating perplexity...
2023-12-18 12:19:51 sparseml.modifiers.obcq.utils.helpers INFO     tensor(5.0572, device='cuda:0')
2023-12-18 12:20:58 sparseml.modifiers.obcq.utils.helpers INFO     tensor(5.5654, device='cuda:0')
...
...
2023-12-18 12:30:43 sparseml.modifiers.obcq.utils.helpers INFO     Perplexity: 5.141213

Also ran OBCQ on llama-7B to ascertain that too works as expected, the command kickstarted fine; waiting for completion, will publish results as soon as complete

The requested changes have been added in a stacked PR fashion; Use the itemized list below to navigate:

[Wanda Refactor] TerminalModuleCompressor contract #1885 (closed as changes were included in 1887)
- [Wanda Refactor] Define LayerCompressor Contract #1886 (closed as changes are included in 1887)
  - [Wanda Refactor] Wanda/OBCQ Modifier Refactor #1887 (merged into this branch w/o complete reviews as this touches the same files, avoiding unnecessary extra review)

Major changes include:

Adding a WandaPruningModifier
Wanda is a general case of OBCQ; leverage this fact to share common code b/w the algorithms
- Extract a Base class LayerCompressor, leave specific implementations to WandaLayerCompressor/OBCQLayerCompressor
- Extract a Base Class ModuleCompressor from SparseGPT and WandaModuleCompressor; leave specific implementations to the algorithm
- Make SparseGPT modifier inherit from WandaModifier and specialize it for running OBCQ algorithm

To see the specific tasks where the Asana app for GitHub is being used, see below:
- https://app.asana.com/0/0/1205675185643185

src/sparseml/modifiers/pruning/wanda/pytorch.py

Satrat

This is a lot of shared code with the SpareGPTModifier, if this needs to get pushed in quickly to support research then sure ship it! But if not, I'm uncomfortable pushing this since theres so much duplicated code across multiple files. Rather than moving the shared code around, could the Wanda modifier just inherit from SparseGPT and overwrite functions as neccesary? We do something similar for the SmoothQuant and LogQuant modifiers

Update WrappedGPT

* Define GPT contract * rename tmp -> batch_size * Define LayerCompressor Contract * Rename gpt_helpers to gpts Fix some docstrings * add named argument to function call * Wanda/OBCQ refactor * propagate target-ids * Address review comments from * #1885 * #1886

Satrat

LGTM! One nit: I'm thinking we should move the obcq folder to pruning/obcq?

rahul-tuli · 2023-12-19T16:20:45Z

LGTM! One nit: I'm thinking we should move the obcq folder to pruning/obcq?

I agree let's do that as a follow up PR

rahul-tuli force-pushed the wanda branch from c52bfd9 to 0ad14bc Compare November 16, 2023 17:26

rahul-tuli marked this pull request as ready for review November 16, 2023 17:36

rahul-tuli requested review from Satrat, bfineran and anmarques November 16, 2023 17:37

rahul-tuli self-assigned this Nov 16, 2023

rahul-tuli added the mle-team label Nov 16, 2023

rahul-tuli requested review from dalistarh, dsikka and dbogunowicz November 16, 2023 18:40

dbogunowicz previously approved these changes Nov 17, 2023

View reviewed changes

src/sparseml/modifiers/pruning/wanda/pytorch.py Show resolved Hide resolved

Satrat requested changes Nov 29, 2023

View reviewed changes

rahul-tuli added 5 commits December 1, 2023 09:25

Add wanda base

9b7cc6e

Initial implementation

87171b7

Update Wanda Base

cf65bb5

Refactor to use WandaLayerCompressor

b8d4dff

Update WrappedGPT

Rename WrappedGPT to WandaGPT

563efd7

rahul-tuli force-pushed the wanda branch from c3e78a0 to 563efd7 Compare December 1, 2023 14:25

rahul-tuli added 5 commits December 12, 2023 11:13

Merge branch 'main' into wanda

5040f00

Merge branch 'main' into wanda

867dd3a

Merge branch 'main' into wanda

854fd69

Merge branch 'main' into wanda

1d1aaca

rahul-tuli dismissed dbogunowicz’s stale review via 5f24ff9 December 18, 2023 14:55

rahul-tuli added 5 commits December 18, 2023 11:26

Fix typo

149f297

Update test

084ded5

Fix regression

66dc2e5

Merge branch 'main' into wanda

672f13a

Merge branch 'main' into wanda

94f3951

Satrat approved these changes Dec 19, 2023

View reviewed changes

rahul-tuli added 2 commits December 21, 2023 14:23

Merge branch 'main' into wanda

c860b86

Merge branch 'main' into wanda

84a69be

bfineran approved these changes Dec 27, 2023

View reviewed changes

rahul-tuli merged commit 239db82 into main Dec 28, 2023
12 checks passed

rahul-tuli deleted the wanda branch December 28, 2023 13:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wanda #1834

Wanda #1834

rahul-tuli commented Nov 15, 2023 •

edited

Loading

Satrat left a comment

Satrat left a comment

rahul-tuli commented Dec 19, 2023

Wanda #1834

Wanda #1834

Conversation

rahul-tuli commented Nov 15, 2023 • edited Loading

Satrat left a comment

Choose a reason for hiding this comment

Satrat left a comment

Choose a reason for hiding this comment

rahul-tuli commented Dec 19, 2023

rahul-tuli commented Nov 15, 2023 •

edited

Loading