Add AutoGPTQ quantization script #545

Glavin001 · 2023-09-09T21:32:08Z

Let's wait until #521 is merged.

Closes #491

Quantize automatically with Axolotl with no custom scripts required.

Demo

How to try yourself

Create a quantized model with Axolotl in 3 steps:

1️⃣ Train

accelerate launch ./scripts/finetune.py ./examples/llama-2/lora.yml

2️⃣ Merge

accelerate launch ./scripts/finetune.py ./examples/llama-2/lora.yml --merge_lora

3️⃣ 🆕 Quantize

accelerate launch ./scripts/finetune.py ./examples/llama-2/lora.yml --quantize

Progress:
Look for logging lines such as:

# [2023-09-11 07:20:37,502] [INFO] [auto_gptq.modeling._base.quantize:364] [PID:3962] [RANK:0] Quantizing self_attn.k_proj in layer 4/32...

This shows 4/32 layers quantized.

Task list

Rewrote AutoGPTQ's advanced quantization script to leverage Axolotl config & internal functions (loading tokenizer & models, merging models, loading datasets all from Axolotl config .yml file): https://github.com/PanQiWei/AutoGPTQ/blob/main/examples/quantization/quant_with_alpaca.py
~~Will add another callback to automatically merge and quantize upon completion, if enabled in config~~
- I couldn't figure out how to release the existing model from GPU memory, so couldn't run merge model directly after / in the same process, so needed to make them separate steps.
- Add --quantize CLI option
Get others to test and provide initial feedback
Clean up PR / old code / etc

…to feat/wandb-pred-table

…rt 2

…diction table

Glavin001 · 2023-09-09T21:32:26Z

scripts/quantize.py

+# import debugpy
+# debugpy.listen(('0.0.0.0', 5678))
+# debugpy.wait_for_client()
+# debugpy.breakpoint()


@Glavin001 clean up old code

Glavin001 · 2023-09-09T21:33:19Z

scripts/quantize.py

+prompter = AlpacaPrompter()
+
+# def load_data(data_path, tokenizer, n_samples, template=TEMPLATE):
+def load_data(data_path, tokenizer, n_samples):


@Glavin001 Delete this. Have a new method using Axolotl built-in functions

Glavin001 · 2023-09-09T21:33:33Z

scripts/quantize.py

+)
+
+# TEMPLATE = "<|prompt|>{instruction}</s><|answer|>"
+prompter = AlpacaPrompter()


@Glavin001 delete. Using Axolotl config and built-in functions now

Glavin001 · 2023-09-09T21:34:13Z

scripts/quantize.py

+# huggingface_username = "CHANGE_ME"
+## CHANGE ABOVE
+
+quantize_config = BaseQuantizeConfig(


@Glavin001 : Add to Axolotl config?

cc @winglian @tmm1 @NanoCode012 : Would you recommend leaving this as default or adding to Axolotl config file as options?

Glavin001 · 2023-09-09T21:34:34Z

scripts/quantize.py

+print("Done importing...")
+
+## CHANGE BELOW ##
+config_path: Path = Path("./examples/llama-2/lora.yml")


@Glavin001 : Replace hard-coded path with the Axolotl callback current config

Glavin001 · 2023-09-09T21:34:57Z

scripts/quantize.py

+configure_logging()
+LOG = logging.getLogger("axolotl")
+
+# logging.basicConfig(


Help Wanted

I couldn't get any logging to work from AutoGPTQ. Would be nice to fix logging.

This is my old code which works when not calling Axolotl's configure_logging()

Glavin001 · 2023-09-09T21:35:44Z

scripts/quantize.py

+        print("Merged model not found. Merging...")
+        # model, tokenizer = load_model(cfg, inference=True)
+        # do_merge_lora_model_and_tokenizer(cfg=cfg, model=model, tokenizer=tokenizer)
+        raise NotImplementedError("Merging model is not implemented yet.")


@Glavin001 TODO implement this. So quantization has merged model to work with

Glavin001 · 2023-09-09T21:36:08Z

scripts/quantize.py

+# accelerate launch ./scripts/finetune.py ./examples/llama-2/lora.yml --merge_lora --lora_model_dir="./lora-out" --load_in_8bit=False --load_in_4bit=False
+# CUDA_VISIBLE_DEVICES="1" accelerate launch ./scripts/finetune.py ./examples/llama-2/lora.yml --merge_lora --lora_model_dir="./lora-out" --load_in_8bit=False --load_in_4bit=False
+
+# HUB_MODEL_ID="Glavin001/llama-2-7b-alpaca_2k_test" accelerate launch ./scripts/quantize.py


@Glavin001 delete test notes

Glavin001 · 2023-09-09T21:36:52Z

src/axolotl/utils/config.py

+        cfg.wandb_project = os.environ.get("WANDB_PROJECT")
+
+    if os.environ.get("HUB_MODEL_ID") and len(os.environ.get("HUB_MODEL_ID", "")) > 0:
+        cfg.hub_model_id = os.environ.get("HUB_MODEL_ID")


FYI: this is used for upcoming work of starting scripts/finetune.py and having it run without any custom / run specific / user specific info in the Axolotl config.

might be better off in the setup_wandb_env_vars function

Glavin001 · 2023-09-11T08:02:29Z

scripts/finetune.py

        train(cfg=parsed_cfg, cli_args=parsed_cli_args, dataset_meta=dataset_meta)
+        # tokenizer = None
+        should_quantize = True


TODO: @Glavin001 should make this based off the config

Glavin001 · 2023-09-11T08:02:32Z

scripts/finetune.py

+
+            log_gpu_memory()
+
+            do_merge_lora(cfg=parsed_cfg, cli_args=parsed_cli_args)


Help Wanted

I kept getting:

Expected a cuda device, but got: cpu

when calling do_merge_lora

Running nvidia-smi always showed lots of GPU memory still taken up / unreleased.

…into feat/autogptq-2

Glavin001 added 13 commits September 3, 2023 05:37

WIP Add training callback to send predictions to WandB table

84d4476

Merge branch 'main' of github.com:OpenAccess-AI-Collective/axolotl in…

766875f

…to feat/wandb-pred-table

WIP improve wandb table reporting callback

0c743e3

WIP improve wandb table reporting callback (cont)

5a7f301

Add VSCode launching for debugging

8c7b7c5

Add tiny llama example

88c31f1

WIP attempt to improve post-eval prediction generation for table

06a44de

WIP attempt to improve post-eval prediction generation for table - pa…

ab3cffa

…rt 2

WIP batch generation

b22d1c6

WIP attempt to handle sample_packing using position_ids for wandb pre…

6f3216e

…diction table

WIP add code for debugging

e9eae77

Fix sample_packing support for wandb prediction table

83e6b29

Clean up code for PR review

aaf4d1e

Glavin001 commented Sep 9, 2023

View reviewed changes

Glavin001 added 3 commits September 11, 2023 07:50

WIP Add AutoGPTQ quantization script

e4c1a2e

WIP Integrate quantization into finetune script

19a30cf

Add --quantize option to finetune script, fix auto_gptq logging

894a4be

Glavin001 force-pushed the feat/autogptq-2 branch from 972bfcf to 894a4be Compare September 11, 2023 07:53

Glavin001 marked this pull request as ready for review September 11, 2023 07:59

Glavin001 changed the title ~~WIP Add AutoGPTQ quantization script~~ Add AutoGPTQ quantization script Sep 11, 2023

Glavin001 mentioned this pull request Sep 11, 2023

Built-in script to quantize with AutoGPTQ then push to Huggingface #491

Open

11 tasks

Glavin001 commented Sep 11, 2023

View reviewed changes

Glavin001 added 3 commits September 11, 2023 08:04

Disable quantizing directly after fine tuning

24c0483

Add eval_table_size, eval_table_max_new_tokens configs & clean up code

14d26e1

Clean up PR, delete VSCode config, add tiny-llama example

c6c54ee

Glavin001 marked this pull request as draft September 12, 2023 07:01

Glavin001 added 5 commits September 12, 2023 07:04

Add eval_table_size, eval_table_max_new_tokens configs & clean up code

dee3d54

Clean up PR, delete VSCode config, add tiny-llama example

09b16d8

Merge branch 'feat/wandb-pred-table' of github.com:Glavin001/axolotl …

578d8b6

…into feat/autogptq-2

WIP quantize model & push model

cf23998

WIP

8a26ab3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AutoGPTQ quantization script #545

Add AutoGPTQ quantization script #545

Glavin001 commented Sep 9, 2023 •

edited

Loading

Glavin001 Sep 9, 2023

Glavin001 Sep 9, 2023

Glavin001 Sep 9, 2023

Glavin001 Sep 9, 2023

Glavin001 Sep 9, 2023

Glavin001 Sep 9, 2023

Glavin001 Sep 9, 2023

Glavin001 Sep 9, 2023

Glavin001 Sep 9, 2023

Glavin001 Sep 9, 2023

winglian Sep 11, 2023

Glavin001 Sep 11, 2023

Glavin001 Sep 11, 2023 •

edited

Loading


		log_gpu_memory()

		do_merge_lora(cfg=parsed_cfg, cli_args=parsed_cli_args)

Add AutoGPTQ quantization script #545

Are you sure you want to change the base?

Add AutoGPTQ quantization script #545

Conversation

Glavin001 commented Sep 9, 2023 • edited Loading

Demo

How to try yourself

Task list

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Glavin001 Sep 11, 2023 • edited Loading

Choose a reason for hiding this comment

Glavin001 commented Sep 9, 2023 •

edited

Loading

Glavin001 Sep 11, 2023 •

edited

Loading