LoRA not working with accelerate + bfloat16 (without load_in_8_bit) #793

Palmik · 2023-10-27T12:01:34Z

Please check that this issue hasn't been reported before.

I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

I should be able to use load_in_8bit: False + load_in_4bit: False + adapter: lora.
This issue was already observed here: #456 but closed without resolution.

Current behaviour

When you try this today, the process gets stuck, e.g.:

[2023-10-27 11:00:34,160] [DEBUG] [axolotl.load_tokenizer:75] [PID:11028] [RANK:0] EOS: 2 / </s>
[2023-10-27 11:00:34,161] [DEBUG] [axolotl.load_tokenizer:76] [PID:11028] [RANK:0] BOS: 1 / <s>
[2023-10-27 11:00:34,161] [DEBUG] [axolotl.load_tokenizer:77] [PID:11028] [RANK:0] PAD: 2 / </s>
[2023-10-27 11:00:34,161] [DEBUG] [axolotl.load_tokenizer:78] [PID:11028] [RANK:0] UNK: 0 / <unk>
[2023-10-27 11:00:34,434] [INFO] [axolotl.load_tokenized_prepared_datasets:129] [PID:11028] [RANK:0] Loading prepared dataset from disk at last_run_prepared/2d315456dea8cf361f2683ece0705571...
[2023-10-27 11:00:34,437] [INFO] [axolotl.load_tokenized_prepared_datasets:131] [PID:11028] [RANK:0] Prepared dataset loaded from disk...
[2023-10-27 11:00:35,053] [INFO] [axolotl.calculate_total_num_steps:245] [PID:11028] [RANK:0] total_num_steps: 149
[2023-10-27 11:00:35,061] [INFO] [axolotl.train.train:46] [PID:11028] [RANK:0] loading tokenizer... mistralai/Mistral-7B-v0.1
[2023-10-27 11:00:35,196] [DEBUG] [axolotl.load_tokenizer:75] [PID:11028] [RANK:0] EOS: 2 / </s>
[2023-10-27 11:00:35,196] [DEBUG] [axolotl.load_tokenizer:76] [PID:11028] [RANK:0] BOS: 1 / <s>
[2023-10-27 11:00:35,198] [DEBUG] [axolotl.load_tokenizer:77] [PID:11028] [RANK:0] PAD: 2 / </s>
[2023-10-27 11:00:35,198] [DEBUG] [axolotl.load_tokenizer:78] [PID:11028] [RANK:0] UNK: 0 / <unk>
[2023-10-27 11:00:35,473] [INFO] [axolotl.train.train:54] [PID:11028] [RANK:0] loading model and (optionally) peft_config...
Loading checkpoint shards: 100%|________________________________________________________________________________________________________________________________________________________________________________________________________________________________________| 2/2 [00:25<00:00, 12.58s/it]
[2023-10-27 11:03:11,241] [INFO] [axolotl.load_model:438] [PID:11028] [RANK:0] converting modules to torch.bfloat16 for flash attention
[2023-10-27 11:03:11,850] [INFO] [axolotl.load_lora:547] [PID:11028] [RANK:0] found linear modules: ['q_proj', 'o_proj', 'v_proj', 'k_proj', 'down_proj', 'gate_proj', 'up_proj']
trainable params: 167,772,160 || all params: 7,409,504,256 || trainable%: 2.264283199029719
[2023-10-27 11:03:17,041] [INFO] [axolotl.load_model:474] [PID:11028] [RANK:0] GPU memory usage after adapters: 0.000GB ()

(after this, nothing happens)

Steps to reproduce

.

Config yaml

base_model: mistralai/Mistral-7B-v0.1
model_type: MistralForCausalLM
tokenizer_type: LlamaTokenizer
is_mistral_derived_model: true

load_in_8bit: false
load_in_4bit: false
strict: false

datasets:

path: ...
type: completion
dataset_prepared_path: last_run_prepared
val_set_size: 0
output_dir: ./mistral-7b-lora-out

adapter: lora
lora_model_dir:

sequence_len: 8192
sample_packing: true
pad_to_sequence_len: true

lora_r: 64
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_target_modules:

gate_proj
down_proj
up_proj
q_proj
v_proj
k_proj
o_proj

wandb_project:
wandb_entity:
wandb_watch:
wandb_run_id:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 3
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: true
group_by_length: false
bf16: true
fp16: false
tf32: false

gradient_checkpointing: false
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 10

eval_steps: 20

eval_table_size:

eval_table_max_new_tokens: 128

save_steps:
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
bos_token: ""
eos_token: ""
unk_token: ""

Possible solution

No response

Which Operating Systems are you using?

Linux
macOS
Windows

Python Version

3.10.13

axolotl branch-commit

latest docker

Acknowledgements

My issue title is concise, descriptive, and in title casing.
I have searched the existing issues to make sure this bug has not been reported yet.
I am using the latest version of axolotl.
I have provided enough information for the maintainers to reproduce and diagnose the issue.

The text was updated successfully, but these errors were encountered:

winglian · 2023-10-28T01:38:58Z

This isn't a supported configuration. We only officially support LoRA with 8bit and qLoRA with 4bit. if you wish to submit a fix to enable 16 bit lora fine-tuning, we would definitely welcome that PR.

Palmik added the bug Something isn't working label Oct 27, 2023

winglian closed this as completed Oct 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LoRA not working with accelerate + bfloat16 (without load_in_8_bit) #793

LoRA not working with accelerate + bfloat16 (without load_in_8_bit) #793

Palmik commented Oct 27, 2023

winglian commented Oct 28, 2023

LoRA not working with accelerate + bfloat16 (without load_in_8_bit) #793

LoRA not working with accelerate + bfloat16 (without load_in_8_bit) #793

Comments

Palmik commented Oct 27, 2023

Please check that this issue hasn't been reported before.

Expected Behavior

Current behaviour

Steps to reproduce

Config yaml

eval_steps: 20

eval_table_size:

eval_table_max_new_tokens: 128

Possible solution

Which Operating Systems are you using?

Python Version

axolotl branch-commit

Acknowledgements

winglian commented Oct 28, 2023