Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set fp16 to false if bf16, update bf16: auto in example YAMLs #1122

Merged
merged 4 commits into from
Jan 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -460,8 +460,8 @@ See [examples](examples) for quick start. It is recommended to duplicate and mod
```yaml
load_in_4bit: true
load_in_8bit: true
bf16: true # require >=ampere
fp16: true
bf16: auto # require >=ampere, auto will detect if your GPU supports this and choose automatically.
fp16: # leave empty to use fp16 when bf16 is 'auto'. set to false if you want to fallback to fp32
tf32: true # require >=ampere
bfloat16: true # require >=ampere, use instead of bf16 when you don't want AMP (automatic mixed precision)
float16: true # use instead of fp16 when you don't want AMP
Expand Down
4 changes: 2 additions & 2 deletions examples/cerebras/btlm-ft.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,8 +53,8 @@ lr_quadratic_warmup: true
learning_rate: 0.000085
train_on_inputs: true
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: true

gradient_checkpointing: false
Expand Down
4 changes: 2 additions & 2 deletions examples/cerebras/qlora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,8 @@ lr_scheduler: cosine
learning_rate: 0.0002
train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: true
gradient_checkpointing: true
early_stopping_patience:
Expand Down
4 changes: 2 additions & 2 deletions examples/code-llama/13b/lora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,8 @@ learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
Expand Down
4 changes: 2 additions & 2 deletions examples/code-llama/13b/qlora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,8 @@ learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
Expand Down
4 changes: 2 additions & 2 deletions examples/code-llama/34b/lora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,8 @@ learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
Expand Down
4 changes: 2 additions & 2 deletions examples/code-llama/34b/qlora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,8 @@ learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
Expand Down
4 changes: 2 additions & 2 deletions examples/code-llama/7b/lora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,8 @@ learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
Expand Down
4 changes: 2 additions & 2 deletions examples/code-llama/7b/qlora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,8 @@ learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
Expand Down
4 changes: 2 additions & 2 deletions examples/falcon/config-7b-lora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,8 @@ lr_scheduler: cosine
learning_rate: 0.00003
train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: true
gradient_checkpointing: true
early_stopping_patience:
Expand Down
4 changes: 2 additions & 2 deletions examples/falcon/config-7b-qlora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -64,8 +64,8 @@ lr_scheduler: cosine
learning_rate: 0.0002
train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: true
gradient_checkpointing: true
# stop training after this many evaluation losses have increased in a row
Expand Down
4 changes: 2 additions & 2 deletions examples/falcon/config-7b.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,8 @@ lr_scheduler: cosine
learning_rate: 0.00003
train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: true
gradient_checkpointing: true
early_stopping_patience:
Expand Down
4 changes: 2 additions & 2 deletions examples/gptj/qlora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,8 @@ lr_scheduler: cosine
learning_rate: 0.0001
train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: true
gradient_checkpointing: true
early_stopping_patience:
Expand Down
2 changes: 1 addition & 1 deletion examples/jeopardy-bot/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ lr_scheduler: cosine
learning_rate: 0.00003
train_on_inputs: false
group_by_length: false
bf16: true
bf16: auto
tf32: true
early_stopping_patience:
resume_from_checkpoint:
Expand Down
4 changes: 2 additions & 2 deletions examples/llama-2/fft_optimized.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,8 @@ learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
Expand Down
4 changes: 2 additions & 2 deletions examples/llama-2/lora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,8 @@ learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
Expand Down
4 changes: 2 additions & 2 deletions examples/llama-2/qlora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,8 @@ learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
Expand Down
4 changes: 2 additions & 2 deletions examples/llama-2/relora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,8 @@ learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
Expand Down
4 changes: 2 additions & 2 deletions examples/mamba/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,8 @@ learning_rate: 5e-5
train_on_inputs: false
group_by_length: true

bf16: true
fp16: false
bf16: auto
fp16:
tf32: true

gradient_checkpointing: false
Expand Down
4 changes: 2 additions & 2 deletions examples/mistral/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,8 @@ learning_rate: 0.000005

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
Expand Down
4 changes: 2 additions & 2 deletions examples/mistral/mixtral.yml
Original file line number Diff line number Diff line change
Expand Up @@ -63,8 +63,8 @@ learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
Expand Down
4 changes: 2 additions & 2 deletions examples/mistral/qlora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,8 @@ learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
Expand Down
2 changes: 1 addition & 1 deletion examples/mpt-7b/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ lr_scheduler: cosine
learning_rate: 0.0000002
train_on_inputs: false
group_by_length: false
bf16: true
bf16: auto
tf32: true
early_stopping_patience:
resume_from_checkpoint:
Expand Down
4 changes: 2 additions & 2 deletions examples/phi/phi-ft.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,8 @@ learning_rate: 0.000003

train_on_inputs: false
group_by_length: true
bf16: true
fp16: false
bf16: auto
fp16:
tf32: true

gradient_checkpointing:
Expand Down
4 changes: 2 additions & 2 deletions examples/phi/phi-qlora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,8 @@ learning_rate: 0.000003

train_on_inputs: false
group_by_length: true
bf16: true
fp16: false
bf16: auto
fp16:
tf32: true

gradient_checkpointing:
Expand Down
4 changes: 2 additions & 2 deletions examples/phi/phi2-ft.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,8 @@ learning_rate: 1e-5

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: true

gradient_checkpointing: true
Expand Down
2 changes: 1 addition & 1 deletion examples/pythia/lora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ num_epochs: 4
learning_rate: 0.00001
train_on_inputs: false
group_by_length: false
bf16: true
bf16: auto
tf32: true
early_stopping_patience:
resume_from_checkpoint:
Expand Down
4 changes: 2 additions & 2 deletions examples/qwen/lora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,8 @@ learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: false
Expand Down
4 changes: 2 additions & 2 deletions examples/qwen/qlora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,8 @@ learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: false
Expand Down
2 changes: 1 addition & 1 deletion examples/redpajama/config-3b.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ lr_scheduler: cosine
learning_rate: 0.0000002
train_on_inputs: false
group_by_length: false
bf16: true
bf16: auto
tf32: true
early_stopping_patience:
resume_from_checkpoint:
Expand Down
2 changes: 1 addition & 1 deletion examples/replit-3b/config-lora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ lr_scheduler:
learning_rate: 0.00001
train_on_inputs: false
group_by_length: false
bf16: true
bf16: auto
tf32: true
gradient_checkpointing:
early_stopping_patience:
Expand Down
4 changes: 2 additions & 2 deletions examples/tiny-llama/lora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,8 @@ learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
Expand Down
4 changes: 2 additions & 2 deletions examples/tiny-llama/pretrain.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,8 @@ learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
Expand Down
4 changes: 2 additions & 2 deletions examples/tiny-llama/qlora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,8 @@ learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
Expand Down
4 changes: 2 additions & 2 deletions examples/xgen-7b/xgen-7b-8k-qlora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -62,8 +62,8 @@ lr_scheduler: cosine
learning_rate: 0.00002
train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: false
gradient_checkpointing: true
# stop training after this many evaluation losses have increased in a row
Expand Down
4 changes: 2 additions & 2 deletions examples/yi-34B-chat/qlora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ load_in_8bit: false
load_in_4bit: true
strict: false
sequence_len: 1024
bf16: true
fp16: false
bf16: auto
fp16:
tf32: false
flash_attention: true
special_tokens:
Expand Down
Loading