Llama2 GPTQ training does not work #599

Napuh · 2023-09-18T13:06:37Z

Please check that this issue hasn't been reported before.

I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

execute finetune.py with examples/llama-2/gptq-lora.yml.

Execution does not throw any error and the model trains fine.

Current behaviour

Execution throws error after a while. It does not start the trainer whatsoever.

Error thrown:

[2023-09-18 11:25:48,695] [INFO] [axolotl.train.train:57] [PID:6348] [RANK:0] loading model and (optionally) peft_config...                                                                                                                     [2023-09-18 11:28:06,557] [ERROR] [axolotl.load_model:321] [PID:6348] [RANK:0] Exception raised attempting to load model, retrying with AutoModelForCausalLM                                                                                    [2023-09-18 11:28:06,557] [ERROR] [axolotl.load_model:324] [PID:6348] [RANK:0] Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.You can deactivate exllama backend by setting `disable_exllama=True` in the quantization config object                                                                                            Traceback (most recent call last):                                                                                        File "/root/axolotl/src/axolotl/utils/models.py", line 272, in load_model                                                 model = AutoModelForCausalLM.from_pretrained(                                                                         File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained 
    return model_class.from_pretrained(
  File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3287, in from_pretrained
    model = quantizer.post_init_model(model)
  File "/opt/conda/lib/python3.10/site-packages/optimum/gptq/quantizer.py", line 482, in post_init_model
    raise ValueError(
ValueError: Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.You can deactivate exllama backend by setting `disable_exllama=True` in the quantization config object
Traceback (most recent call last):
  File "/root/axolotl/src/axolotl/utils/models.py", line 272, in load_model
    model = AutoModelForCausalLM.from_pretrained(
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained
    return model_class.from_pretrained(
  File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3287, in from_pretrained
    model = quantizer.post_init_model(model)
  File "/opt/conda/lib/python3.10/site-packages/optimum/gptq/quantizer.py", line 482, in post_init_model
    raise ValueError(
ValueError: Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.You can deactivate exllama backend by setting `disable_exllama=True` in the quantization config object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/axolotl/scripts/finetune.py", line 52, in <module>
    fire.Fire(do_cli)
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/root/axolotl/scripts/finetune.py", line 48, in do_cli
    train(cfg=parsed_cfg, cli_args=parsed_cli_args, dataset_meta=dataset_meta)
  File "/root/axolotl/src/axolotl/train.py", line 58, in train
    model, peft_config = load_model(cfg, tokenizer, inference=cli_args.inference)
  File "/root/axolotl/src/axolotl/utils/models.py", line 325, in load_model
    model = AutoModelForCausalLM.from_pretrained(
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained
    return model_class.from_pretrained(
  File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3287, in from_pretrained
    model = quantizer.post_init_model(model)
  File "/opt/conda/lib/python3.10/site-packages/optimum/gptq/quantizer.py", line 482, in post_init_model
    raise ValueError(
ValueError: Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.You can deactivate exllama backend by setting `disable_exllama=True` in the quantization config object
Traceback (most recent call last):
  File "/opt/conda/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
    args.func(args)
  File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/launch.py", line 986, in launch_command
    simple_launcher(args)
  File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/launch.py", line 628, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/bin/python', 'scripts/finetune.py', 'examples/llama-2/gptq-lora.yml']' returned non-zero exit status 1.

As suggested by @NanoCode012 I changed the config.json of the model in order to add "disable_exllama": true in the quantization_config section. This thows a different error:

Traceback (most recent call last):
  File "/root/axolotl/scripts/finetune.py", line 52, in <module>
    fire.Fire(do_cli)
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/root/axolotl/scripts/finetune.py", line 48, in do_cli
    train(cfg=parsed_cfg, cli_args=parsed_cli_args, dataset_meta=dataset_meta)
  File "/root/axolotl/src/axolotl/train.py", line 58, in train
    model, peft_config = load_model(cfg, tokenizer, inference=cli_args.inference)
  File "/root/axolotl/src/axolotl/utils/models.py", line 420, in load_model
    log_gpu_memory_usage(LOG, "after adapters", model.device)
  File "/root/axolotl/src/axolotl/utils/bench.py", line 37, in log_gpu_memory_usage
    usage, cache, misc = gpu_memory_usage_all(device)
  File "/root/axolotl/src/axolotl/utils/bench.py", line 13, in gpu_memory_usage_all
    usage = torch.cuda.memory_allocated(device) / 1024.0**3
  File "/opt/conda/lib/python3.10/site-packages/torch/cuda/memory.py", line 351, in memory_allocated
    return memory_stats(device=device).get("allocated_bytes.all.current", 0)
  File "/opt/conda/lib/python3.10/site-packages/torch/cuda/memory.py", line 230, in memory_stats
    stats = memory_stats_as_nested_dict(device=device)
  File "/opt/conda/lib/python3.10/site-packages/torch/cuda/memory.py", line 241, in memory_stats_as_nested_dict
    device = _get_device_index(device, optional=True)
  File "/opt/conda/lib/python3.10/site-packages/torch/cuda/_utils.py", line 32, in _get_device_index
    raise ValueError('Expected a cuda device, but got: {}'.format(device))
ValueError: Expected a cuda device, but got: cpu
Traceback (most recent call last):
  File "/opt/conda/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
    args.func(args)
  File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/launch.py", line 986, in launch_command
    simple_launcher(args)
  File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/launch.py", line 628, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/bin/python', 'scripts/finetune.py', 'examples/llama-2/gptq-lora.yml']' returned non-zero exit status 1.

GPU memory should be enough (24GB RTX3090).

Steps to reproduce

Clone repository
Install dependencies
Execute accelerate launch scripts/finetune.py examples/llama-2/gptq-lora.yml

To change the config.json file, I downloaded the model data into a local folder using the following code:

from huggingface_hub import snapshot_download
snapshot_download(repo_id="TheBloke/Llama-2-7B-GPTQ", local_dir="llama2gptq")

And then added "disable_exllama": true in the quantization_config section of the file.

Possible solution

No response

Which Operating Systems are you using?

Linux
macOS
Windows

Python Version

3.10.11

axolotl branch-commit

main

Acknowledgements

My issue title is concise, descriptive, and in title casing.
I have searched the existing issues to make sure this bug has not been reported yet.
I am using the latest version of axolotl.
I have provided enough information for the maintainers to reproduce and diagnose the issue.

The text was updated successfully, but these errors were encountered:

Napuh added the bug Something isn't working label Sep 18, 2023

winglian mentioned this issue Sep 19, 2023

support to disable exllama for gptq #604

Merged

winglian closed this as completed in #604 Sep 19, 2023

This was referenced Sep 20, 2023

skip the gpu memory checks if the device is set to 'auto' #609

Merged

override device_map if the model is gptq-based #611

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama2 GPTQ training does not work #599

Llama2 GPTQ training does not work #599

Napuh commented Sep 18, 2023

Llama2 GPTQ training does not work #599

Llama2 GPTQ training does not work #599

Comments

Napuh commented Sep 18, 2023

Please check that this issue hasn't been reported before.

Expected Behavior

Current behaviour

Steps to reproduce

Possible solution

Which Operating Systems are you using?

Python Version

axolotl branch-commit

Acknowledgements