Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phi2 multipack #1173

Merged
merged 11 commits into from
Jan 23, 2024
19 changes: 8 additions & 11 deletions examples/phi/phi-ft.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
base_model: microsoft/phi-1_5
model_type: PhiForCausalLM
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
is_llama_derived_model: false
trust_remote_code: true

load_in_8bit: false
load_in_4bit: false
Expand All @@ -18,7 +16,7 @@ output_dir: ./phi-sft-out

sequence_len: 2048
sample_packing: true
pad_to_sequence_len:
pad_to_sequence_len: true

adapter:
lora_model_dir:
Expand All @@ -35,7 +33,7 @@ wandb_name:
wandb_log_model:

gradient_accumulation_steps: 1
micro_batch_size: 1
micro_batch_size: 2
num_epochs: 4
optimizer: adamw_torch
adam_beta2: 0.95
Expand All @@ -45,18 +43,20 @@ lr_scheduler: cosine
learning_rate: 0.000003

train_on_inputs: false
group_by_length: true
group_by_length: false
bf16: auto
fp16:
tf32: true

gradient_checkpointing:
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: True
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention:
flash_attention: true

warmup_steps: 100
evals_per_epoch: 4
Expand All @@ -68,7 +68,4 @@ fsdp:
fsdp_config:
resize_token_embeddings_to_32x: true
special_tokens:
bos_token: "<|endoftext|>"
eos_token: "<|endoftext|>"
unk_token: "<|endoftext|>"
pad_token: "<|endoftext|>"
21 changes: 9 additions & 12 deletions examples/phi/phi-qlora.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
base_model: microsoft/phi-1_5
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
is_llama_derived_model: false
trust_remote_code: true

load_in_8bit: false
load_in_4bit: true
Expand All @@ -16,9 +14,9 @@ dataset_prepared_path:
val_set_size: 0.05
output_dir: ./phi-sft-out

sequence_len: 1024
sample_packing: false # not CURRENTLY compatible with LoRAs
pad_to_sequence_len:
sequence_len: 2048
sample_packing: true
pad_to_sequence_len: true

adapter: qlora
lora_model_dir:
Expand All @@ -35,7 +33,7 @@ wandb_name:
wandb_log_model:

gradient_accumulation_steps: 1
micro_batch_size: 1
micro_batch_size: 2
num_epochs: 4
optimizer: adamw_torch
adam_beta2: 0.95
Expand All @@ -45,18 +43,20 @@ lr_scheduler: cosine
learning_rate: 0.000003

train_on_inputs: false
group_by_length: true
group_by_length: false
bf16: auto
fp16:
tf32: true

gradient_checkpointing:
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: True
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention:
flash_attention: true

warmup_steps: 100
evals_per_epoch: 4
Expand All @@ -68,7 +68,4 @@ fsdp:
fsdp_config:
resize_token_embeddings_to_32x: true
special_tokens:
bos_token: "<|endoftext|>"
eos_token: "<|endoftext|>"
unk_token: "<|endoftext|>"
pad_token: "<|endoftext|>"
25 changes: 11 additions & 14 deletions examples/phi/phi2-ft.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
base_model: microsoft/phi-2
model_revision: 834565c # pin model repo to the previous architecture
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
trust_remote_code: true

load_in_8bit: false
load_in_4bit: false
Expand All @@ -17,19 +15,16 @@ val_set_size: 0.05
output_dir: ./phi-sft-out

sequence_len: 2048
sample_packing: false # currently unsupported
pad_to_sequence_len:
sample_packing: true
pad_to_sequence_len: true

adapter:
lora_model_dir:
lora_r: 16
lora_alpha: 32
lora_dropout: 0.1
lora_target_linear: true
lora_r:
lora_alpha:
lora_dropout:
lora_target_linear:
lora_fan_in_fan_out:
lora_modules_to_save:
- embd
- lm_head

wandb_project:
wandb_entity:
Expand All @@ -38,14 +33,14 @@ wandb_name:
wandb_log_model:

gradient_accumulation_steps: 1
micro_batch_size: 1
micro_batch_size: 2
num_epochs: 4
optimizer: paged_adamw_8bit
optimizer: adamw_torch
adam_beta2: 0.95
adam_epsilon: 0.00001
max_grad_norm: 1.0
lr_scheduler: cosine
learning_rate: 1e-5
learning_rate: 0.000003

train_on_inputs: false
group_by_length: false
Expand All @@ -54,6 +49,8 @@ fp16:
tf32: true

gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: True
early_stopping_patience:
resume_from_checkpoint:
local_rank:
Expand Down
2 changes: 1 addition & 1 deletion src/axolotl/core/trainer_builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -930,7 +930,7 @@ def build_collator(
]
]
if use_batch_sampler_collator:
if self.cfg.model_config_type in ["mixtral", "qwen2"]:
if self.cfg.model_config_type in ["mixtral", "qwen2", "falcon", "phi"]:
collator = V2BatchSamplerDataCollatorForSeq2Seq
else:
collator = BatchSamplerDataCollatorForSeq2Seq
Expand Down
8 changes: 0 additions & 8 deletions src/axolotl/models/phi/__init__.py

This file was deleted.

63 changes: 0 additions & 63 deletions src/axolotl/models/phi/configuration_mixformer_sequential.py

This file was deleted.

65 changes: 0 additions & 65 deletions src/axolotl/models/phi/configuration_phi.py

This file was deleted.

Loading