Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama2 GPTQ training does not work #599

Closed
6 of 8 tasks
Napuh opened this issue Sep 18, 2023 · 0 comments · Fixed by #604
Closed
6 of 8 tasks

Llama2 GPTQ training does not work #599

Napuh opened this issue Sep 18, 2023 · 0 comments · Fixed by #604
Labels
bug Something isn't working

Comments

@Napuh
Copy link
Contributor

Napuh commented Sep 18, 2023

Please check that this issue hasn't been reported before.

  • I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

execute finetune.py with examples/llama-2/gptq-lora.yml.

Execution does not throw any error and the model trains fine.

Current behaviour

Execution throws error after a while. It does not start the trainer whatsoever.

Error thrown:

[2023-09-18 11:25:48,695] [INFO] [axolotl.train.train:57] [PID:6348] [RANK:0] loading model and (optionally) peft_config...                                                                                                                     [2023-09-18 11:28:06,557] [ERROR] [axolotl.load_model:321] [PID:6348] [RANK:0] Exception raised attempting to load model, retrying with AutoModelForCausalLM                                                                                    [2023-09-18 11:28:06,557] [ERROR] [axolotl.load_model:324] [PID:6348] [RANK:0] Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.You can deactivate exllama backend by setting `disable_exllama=True` in the quantization config object                                                                                            Traceback (most recent call last):                                                                                        File "/root/axolotl/src/axolotl/utils/models.py", line 272, in load_model                                                 model = AutoModelForCausalLM.from_pretrained(                                                                         File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained 
    return model_class.from_pretrained(
  File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3287, in from_pretrained
    model = quantizer.post_init_model(model)
  File "/opt/conda/lib/python3.10/site-packages/optimum/gptq/quantizer.py", line 482, in post_init_model
    raise ValueError(
ValueError: Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.You can deactivate exllama backend by setting `disable_exllama=True` in the quantization config object
Traceback (most recent call last):
  File "/root/axolotl/src/axolotl/utils/models.py", line 272, in load_model
    model = AutoModelForCausalLM.from_pretrained(
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained
    return model_class.from_pretrained(
  File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3287, in from_pretrained
    model = quantizer.post_init_model(model)
  File "/opt/conda/lib/python3.10/site-packages/optimum/gptq/quantizer.py", line 482, in post_init_model
    raise ValueError(
ValueError: Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.You can deactivate exllama backend by setting `disable_exllama=True` in the quantization config object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/axolotl/scripts/finetune.py", line 52, in <module>
    fire.Fire(do_cli)
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/root/axolotl/scripts/finetune.py", line 48, in do_cli
    train(cfg=parsed_cfg, cli_args=parsed_cli_args, dataset_meta=dataset_meta)
  File "/root/axolotl/src/axolotl/train.py", line 58, in train
    model, peft_config = load_model(cfg, tokenizer, inference=cli_args.inference)
  File "/root/axolotl/src/axolotl/utils/models.py", line 325, in load_model
    model = AutoModelForCausalLM.from_pretrained(
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained
    return model_class.from_pretrained(
  File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3287, in from_pretrained
    model = quantizer.post_init_model(model)
  File "/opt/conda/lib/python3.10/site-packages/optimum/gptq/quantizer.py", line 482, in post_init_model
    raise ValueError(
ValueError: Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.You can deactivate exllama backend by setting `disable_exllama=True` in the quantization config object
Traceback (most recent call last):
  File "/opt/conda/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
    args.func(args)
  File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/launch.py", line 986, in launch_command
    simple_launcher(args)
  File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/launch.py", line 628, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/bin/python', 'scripts/finetune.py', 'examples/llama-2/gptq-lora.yml']' returned non-zero exit status 1.

As suggested by @NanoCode012 I changed the config.json of the model in order to add "disable_exllama": true in the quantization_config section. This thows a different error:

Traceback (most recent call last):
  File "/root/axolotl/scripts/finetune.py", line 52, in <module>
    fire.Fire(do_cli)
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/root/axolotl/scripts/finetune.py", line 48, in do_cli
    train(cfg=parsed_cfg, cli_args=parsed_cli_args, dataset_meta=dataset_meta)
  File "/root/axolotl/src/axolotl/train.py", line 58, in train
    model, peft_config = load_model(cfg, tokenizer, inference=cli_args.inference)
  File "/root/axolotl/src/axolotl/utils/models.py", line 420, in load_model
    log_gpu_memory_usage(LOG, "after adapters", model.device)
  File "/root/axolotl/src/axolotl/utils/bench.py", line 37, in log_gpu_memory_usage
    usage, cache, misc = gpu_memory_usage_all(device)
  File "/root/axolotl/src/axolotl/utils/bench.py", line 13, in gpu_memory_usage_all
    usage = torch.cuda.memory_allocated(device) / 1024.0**3
  File "/opt/conda/lib/python3.10/site-packages/torch/cuda/memory.py", line 351, in memory_allocated
    return memory_stats(device=device).get("allocated_bytes.all.current", 0)
  File "/opt/conda/lib/python3.10/site-packages/torch/cuda/memory.py", line 230, in memory_stats
    stats = memory_stats_as_nested_dict(device=device)
  File "/opt/conda/lib/python3.10/site-packages/torch/cuda/memory.py", line 241, in memory_stats_as_nested_dict
    device = _get_device_index(device, optional=True)
  File "/opt/conda/lib/python3.10/site-packages/torch/cuda/_utils.py", line 32, in _get_device_index
    raise ValueError('Expected a cuda device, but got: {}'.format(device))
ValueError: Expected a cuda device, but got: cpu
Traceback (most recent call last):
  File "/opt/conda/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
    args.func(args)
  File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/launch.py", line 986, in launch_command
    simple_launcher(args)
  File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/launch.py", line 628, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/bin/python', 'scripts/finetune.py', 'examples/llama-2/gptq-lora.yml']' returned non-zero exit status 1.

GPU memory should be enough (24GB RTX3090).

Steps to reproduce

Clone repository
Install dependencies
Execute accelerate launch scripts/finetune.py examples/llama-2/gptq-lora.yml

To change the config.json file, I downloaded the model data into a local folder using the following code:

from huggingface_hub import snapshot_download
snapshot_download(repo_id="TheBloke/Llama-2-7B-GPTQ", local_dir="llama2gptq")

And then added "disable_exllama": true in the quantization_config section of the file.

Possible solution

No response

Which Operating Systems are you using?

  • Linux
  • macOS
  • Windows

Python Version

3.10.11

axolotl branch-commit

main

Acknowledgements

  • My issue title is concise, descriptive, and in title casing.
  • I have searched the existing issues to make sure this bug has not been reported yet.
  • I am using the latest version of axolotl.
  • I have provided enough information for the maintainers to reproduce and diagnose the issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant