Falcon-7B finetuning errors with the example config #1145

radhacr · 2024-01-18T21:13:33Z

Please check that this issue hasn't been reported before.

I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

I'm testing out the falcon-7B finetuning example with the config file examples/falcon/config-7b-qlora.yml as is.

Current behaviour

As suggested in the README, I ran the command line
accelerate launch -m axolotl.cli.train examples/falcon/config-7b-qlora.yml

It first errors out with the following error:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/cli/train.py", line 43, in <module>
    fire.Fire(do_cli)
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(   
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/cli/train.py", line 26, in do_cli
    parsed_cfg = load_cfg(config, **kwargs)
  File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/cli/__init__.py", line 290, in load_cfg
    validate_config(cfg)
  File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/utils/config.py", line 349, in validate_config
    raise ValueError(
ValueError: ``early_stopping_patience`` requires save_steps and eval_steps to be set. eval_steps should evenly divide save_steps.
Traceback (most recent call last):
  File "/home/radhachitta/.local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/radhachitta/.local/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
    args.func(args)
  File "/home/radhachitta/.local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1023, in launch_command
    simple_launcher(args)
  File "/home/radhachitta/.local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 643, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)

After unsetting early_stopping_patience as early_stopping_patience: this is the error

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/cli/train.py", line 43, in <module>
    fire.Fire(do_cli)
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(   
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/cli/train.py", line 38, in do_cli
    dataset_meta = load_datasets(cfg=parsed_cfg, cli_args=parsed_cli_args)
  File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/cli/__init__.py", line 310, in load_datasets
    tokenizer = load_tokenizer(cfg)
  File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/utils/models.py", line 178, in load_tokenizer
    raise ValueError(
ValueError: Please set lora_modules_to_save to `embed_tokens`, `lm_head` when using an adapter and changing the special tokens.
Traceback (most recent call last):
  File "/home/radhachitta/.local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/radhachitta/.local/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
    args.func(args)
  File "/home/radhachitta/.local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1023, in launch_command
    simple_launcher(args)
  File "/home/radhachitta/.local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 643, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/bin/python3.10', '-m', 'axolotl.cli.train', 'examples/falcon/config-7b-qlora.yml']' returned non-zero exit status 1.

Finally after setting the lora_modules_to_save as lora_modules_to_save: embed_tokens, lm_head, this is the error:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/cli/train.py", line 43, in <module>
    fire.Fire(do_cli)
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(   
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/cli/train.py", line 39, in do_cli
    train(cfg=parsed_cfg, cli_args=parsed_cli_args, dataset_meta=dataset_meta)
  File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/train.py", line 65, in train
    model, peft_config = load_model(cfg, tokenizer, inference=cli_args.inference)
  File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/utils/models.py", line 634, in load_model
    model, lora_config = load_adapter(model, cfg, cfg.adapter)
  File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/utils/models.py", line 670, in load_adapter
    return load_lora(model, cfg, inference=inference)  
  File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/utils/models.py", line 756, in load_lora
    model = get_peft_model(model, lora_config)
  File "/home/radhachitta/.local/lib/python3.10/site-packages/peft/mapping.py", line 133, in get_peft_model
    return MODEL_TYPE_TO_PEFT_MODEL_MAPPING[peft_config.task_type](model, peft_config, adapter_name=adapter_name)
  File "/home/radhachitta/.local/lib/python3.10/site-packages/peft/peft_model.py", line 1041, in __init__
    super().__init__(model, peft_config, adapter_name) 
  File "/home/radhachitta/.local/lib/python3.10/site-packages/peft/peft_model.py", line 123, in __init__
    self.base_model = cls(model, {adapter_name: peft_config}, adapter_name)
  File "/home/radhachitta/.local/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 119, in __init__
    super().__init__(model, config, adapter_name)
  File "/home/radhachitta/.local/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 95, in __init__
    self.inject_adapter(self.model, adapter_name)
  File "/home/radhachitta/.local/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 233, in inject_adapter
    new_module = ModulesToSaveWrapper(target, adapter_name)
  File "/home/radhachitta/.local/lib/python3.10/site-packages/peft/utils/other.py", line 177, in __init__
    self.update(adapter_name)
  File "/home/radhachitta/.local/lib/python3.10/site-packages/peft/utils/other.py", line 200, in update
    self.modules_to_save[adapter_name].requires_grad_(True)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2440, in requires_grad_
    p.requires_grad_(requires_grad)
RuntimeError: only Tensors of floating point dtype can require gradients
Traceback (most recent call last):
  File "/home/radhachitta/.local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/radhachitta/.local/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
    args.func(args)
  File "/home/radhachitta/.local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1023, in launch_command
    simple_launcher(args)
  File "/home/radhachitta/.local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 643, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/bin/python3.10', '-m', 'axolotl.cli.train', 'examples/falcon/config-7b-qlora.yml']' returned non-zero exit status 1.

Steps to reproduce

I ran the command
accelerate launch -m axolotl.cli.train examples/falcon/config-7b-qlora.yml

with the changes to the yaml as described above

Config yaml

This is the final config-7b-qlora.yaml which results in the last error

# 1b: tiiuae/falcon-rw-1b
# 40b: tiiuae/falcon-40b
base_model: tiiuae/falcon-7b
# required by falcon custom model code: https://huggingface.co/tiiuae/falcon-7b/tree/main
trust_remote_code: false
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
is_falcon_derived_model: true
load_in_8bit: false
# enable 4bit for QLoRA
load_in_4bit: true
gptq: false
strict: false
push_dataset_to_hub:
datasets:
  - path: QingyiSi/Alpaca-CoT
    data_files:
      - Chain-of-Thought/formatted_cot_data/gsm8k_train.json
    type: "alpaca:chat"
dataset_prepared_path:
val_set_size: 0.05
# enable QLoRA
adapter: qlora
lora_model_dir:
sequence_len: 2048
max_packed_sequence_len:

# hyperparameters from QLoRA paper Appendix B.2
# "We find hyperparameters to be largely robust across datasets"
lora_r: 64
lora_alpha: 16
# 0.1 for models up to 13B
# 0.05 for 33B and 65B models
lora_dropout: 0.05
# add LoRA modules on all linear layers of the base model
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:
output_dir: ./qlora-out
# QLoRA paper Table 9
# - 16 for 7b & 13b
# - 32 for 33b, 64 for 64b
# Max size tested on A6000
# - 7b: 40
# - 40b: 4
# decrease if OOM, increase for max VRAM utilization
micro_batch_size: 1
gradient_accumulation_steps: 2
num_epochs: 4
# Optimizer for QLoRA
optimizer: paged_adamw_32bit
torchdistx_path:
lr_scheduler: cosine
# QLoRA paper Table 9
# - 2e-4 for 7b & 13b
# - 1e-4 for 33b & 64b
learning_rate: 0.0002
train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: true
gradient_checkpointing: true
# stop training after this many evaluation losses have increased in a row
# https://huggingface.co/transformers/v4.2.2/_modules/transformers/trainer_callback.html#EarlyStoppingCallback
#early_stopping_patience: 3
early_stopping_patience:
lora_modules_to_save: embed_tokens, lm_head
resume_from_checkpoint:
auto_resume_from_checkpoints: true
local_rank:
logging_steps: 1
xformers_attention: true
flash_attention:
gptq_groupsize:
gptq_model_v1:
warmup_steps: 10
evals_per_epoch: 4
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.000001
fsdp:
fsdp_config:
special_tokens:
  pad_token: "<|endoftext|>"
  bos_token: ">>ABSTRACT<<"
  eos_token: "<|endoftext|>"

Possible solution

No response

Which Operating Systems are you using?

Linux
macOS
Windows

Python Version

3.10.12

axolotl branch-commit

main v0.3.0

Acknowledgements

My issue title is concise, descriptive, and in title casing.
I have searched the existing issues to make sure this bug has not been reported yet.
I am using the latest version of axolotl.
I have provided enough information for the maintainers to reproduce and diagnose the issue.

The text was updated successfully, but these errors were encountered:

winglian · 2024-01-19T02:34:59Z

#1149 should resolve this

NanoCode012 · 2024-01-23T10:55:43Z

Hey, regarding this lora_modules_to_save: embed_tokens, lm_head error. I might've replied to you or someone else in discord.

Could you try like a list as shown in Readme?

radhacr · 2024-01-24T14:52:17Z

Hey, regarding this lora_modules_to_save: embed_tokens, lm_head error. I might've replied to you or someone else in discord.

Could you try like a list as shown in Readme?

Thanks this worked.

radhacr · 2024-01-24T14:52:58Z

Marking this issue as resolved

radhacr added the bug Something isn't working label Jan 18, 2024

radhacr closed this as completed Jan 24, 2024

NanoCode012 mentioned this issue Jan 24, 2024

fix(log): improve warning to clarify that lora_modules_to_save expect a list #1197

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Falcon-7B finetuning errors with the example config #1145

Falcon-7B finetuning errors with the example config #1145

radhacr commented Jan 18, 2024

winglian commented Jan 19, 2024

NanoCode012 commented Jan 23, 2024

radhacr commented Jan 24, 2024

radhacr commented Jan 24, 2024

Falcon-7B finetuning errors with the example config #1145

Falcon-7B finetuning errors with the example config #1145

Comments

radhacr commented Jan 18, 2024

Please check that this issue hasn't been reported before.

Expected Behavior

Current behaviour

Steps to reproduce

Config yaml

Possible solution

Which Operating Systems are you using?

Python Version

axolotl branch-commit

Acknowledgements

winglian commented Jan 19, 2024

NanoCode012 commented Jan 23, 2024

radhacr commented Jan 24, 2024

radhacr commented Jan 24, 2024