Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama3 Lora training fails to output and save #1650

Open
6 of 8 tasks
austinm1120 opened this issue May 23, 2024 · 0 comments
Open
6 of 8 tasks

Llama3 Lora training fails to output and save #1650

austinm1120 opened this issue May 23, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@austinm1120
Copy link

austinm1120 commented May 23, 2024

Please check that this issue hasn't been reported before.

  • I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

Train Llama 3 based model on dataset and output folder

Current behaviour

Model completes training then says it is saving, outputs a bunch of text and never saves.

Steps to reproduce

Accelerate launch -m axolotl.cli.train examples/llama-3/lora-8b.yml

Model loads and trains

Tries to save but outputs a load of text

[INFO] [axolotl.train.train:173] [PID:1578] [RANK:0] Training Completed!!! Saving pre-trained model to ./outputs/lora-out

image

image

Config yaml

base_model: Orenguteng/Llama-3-8B-Lexi-Uncensored
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: true
load_in_4bit: false
strict: false

datasets:
  - path: output.jsonl
    type: sharegpt
dataset_prepared_path:
val_set_size: 0.05
output_dir: ./outputs/lora-out

sequence_len: 4096
sample_packing: false
pad_to_sequence_len: true

adapter: lora
lora_model_dir:
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 1
num_epochs: 4
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
s2_attention:

warmup_steps: 10
evals_per_epoch: 4
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
   pad_token: <|end_of_text|>

Possible solution

I have trained with the same dataset on Llama2 to see if it could've been that but i was able to just fine, was even able to download the model convert it and run it locally to ensure and isolate it to being related somehow to training the Lora on Llama 3

Which Operating Systems are you using?

  • Linux
  • macOS
  • Windows

Python Version

3.10.14

axolotl branch-commit

Main (Running on Jarvis.ai)

Acknowledgements

  • My issue title is concise, descriptive, and in title casing.
  • I have searched the existing issues to make sure this bug has not been reported yet.
  • I am using the latest version of axolotl.
  • I have provided enough information for the maintainers to reproduce and diagnose the issue.
@austinm1120 austinm1120 added the bug Something isn't working label May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant