Skip to content

Commit

Permalink
set fp16 to false if bf16, update bf16: auto in example YAMLs (#1122)…
Browse files Browse the repository at this point in the history
… [skip ci]

* set fp16 to false if bf16, update bf16: auto in example YAMLs

* unset fp16 so that it fallsback properly if bf16 isn't available

* Update README.md [skip-ci]

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

* test that bf16 disables fp16

---------

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
  • Loading branch information
winglian and NanoCode012 committed Jan 22, 2024
1 parent eaaeefc commit 782b6a4
Show file tree
Hide file tree
Showing 38 changed files with 86 additions and 67 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -464,8 +464,8 @@ See [examples](examples) for quick start. It is recommended to duplicate and mod
```yaml
load_in_4bit: true
load_in_8bit: true
bf16: true # require >=ampere
fp16: true
bf16: auto # require >=ampere, auto will detect if your GPU supports this and choose automatically.
fp16: # leave empty to use fp16 when bf16 is 'auto'. set to false if you want to fallback to fp32
tf32: true # require >=ampere
bfloat16: true # require >=ampere, use instead of bf16 when you don't want AMP (automatic mixed precision)
float16: true # use instead of fp16 when you don't want AMP
Expand Down
4 changes: 2 additions & 2 deletions examples/cerebras/btlm-ft.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,8 +53,8 @@ lr_quadratic_warmup: true
learning_rate: 0.000085
train_on_inputs: true
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: true

gradient_checkpointing: false
Expand Down
4 changes: 2 additions & 2 deletions examples/cerebras/qlora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,8 @@ lr_scheduler: cosine
learning_rate: 0.0002
train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: true
gradient_checkpointing: true
early_stopping_patience:
Expand Down
4 changes: 2 additions & 2 deletions examples/code-llama/13b/lora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,8 @@ learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
Expand Down
4 changes: 2 additions & 2 deletions examples/code-llama/13b/qlora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,8 @@ learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
Expand Down
4 changes: 2 additions & 2 deletions examples/code-llama/34b/lora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,8 @@ learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
Expand Down
4 changes: 2 additions & 2 deletions examples/code-llama/34b/qlora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,8 @@ learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
Expand Down
4 changes: 2 additions & 2 deletions examples/code-llama/7b/lora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,8 @@ learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
Expand Down
4 changes: 2 additions & 2 deletions examples/code-llama/7b/qlora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,8 @@ learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
Expand Down
4 changes: 2 additions & 2 deletions examples/falcon/config-7b-lora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,8 @@ lr_scheduler: cosine
learning_rate: 0.00003
train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: true
gradient_checkpointing: true
early_stopping_patience:
Expand Down
4 changes: 2 additions & 2 deletions examples/falcon/config-7b-qlora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -64,8 +64,8 @@ lr_scheduler: cosine
learning_rate: 0.0002
train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: true
gradient_checkpointing: true
# stop training after this many evaluation losses have increased in a row
Expand Down
4 changes: 2 additions & 2 deletions examples/falcon/config-7b.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,8 @@ lr_scheduler: cosine
learning_rate: 0.00003
train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: true
gradient_checkpointing: true
early_stopping_patience:
Expand Down
4 changes: 2 additions & 2 deletions examples/gptj/qlora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,8 @@ lr_scheduler: cosine
learning_rate: 0.0001
train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: true
gradient_checkpointing: true
early_stopping_patience:
Expand Down
2 changes: 1 addition & 1 deletion examples/jeopardy-bot/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ lr_scheduler: cosine
learning_rate: 0.00003
train_on_inputs: false
group_by_length: false
bf16: true
bf16: auto
tf32: true
early_stopping_patience:
resume_from_checkpoint:
Expand Down
4 changes: 2 additions & 2 deletions examples/llama-2/fft_optimized.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,8 @@ learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
Expand Down
4 changes: 2 additions & 2 deletions examples/llama-2/lora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,8 @@ learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
Expand Down
4 changes: 2 additions & 2 deletions examples/llama-2/qlora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,8 @@ learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
Expand Down
4 changes: 2 additions & 2 deletions examples/llama-2/relora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,8 @@ learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
Expand Down
4 changes: 2 additions & 2 deletions examples/mamba/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,8 @@ learning_rate: 5e-5
train_on_inputs: false
group_by_length: true

bf16: true
fp16: false
bf16: auto
fp16:
tf32: true

gradient_checkpointing: false
Expand Down
4 changes: 2 additions & 2 deletions examples/mistral/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,8 @@ learning_rate: 0.000005

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
Expand Down
4 changes: 2 additions & 2 deletions examples/mistral/mixtral.yml
Original file line number Diff line number Diff line change
Expand Up @@ -63,8 +63,8 @@ learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
Expand Down
4 changes: 2 additions & 2 deletions examples/mistral/qlora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,8 @@ learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
Expand Down
2 changes: 1 addition & 1 deletion examples/mpt-7b/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ lr_scheduler: cosine
learning_rate: 0.0000002
train_on_inputs: false
group_by_length: false
bf16: true
bf16: auto
tf32: true
early_stopping_patience:
resume_from_checkpoint:
Expand Down
4 changes: 2 additions & 2 deletions examples/phi/phi-ft.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,8 @@ learning_rate: 0.000003

train_on_inputs: false
group_by_length: true
bf16: true
fp16: false
bf16: auto
fp16:
tf32: true

gradient_checkpointing:
Expand Down
4 changes: 2 additions & 2 deletions examples/phi/phi-qlora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,8 @@ learning_rate: 0.000003

train_on_inputs: false
group_by_length: true
bf16: true
fp16: false
bf16: auto
fp16:
tf32: true

gradient_checkpointing:
Expand Down
4 changes: 2 additions & 2 deletions examples/phi/phi2-ft.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,8 @@ learning_rate: 1e-5

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: true

gradient_checkpointing: true
Expand Down
2 changes: 1 addition & 1 deletion examples/pythia/lora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ num_epochs: 4
learning_rate: 0.00001
train_on_inputs: false
group_by_length: false
bf16: true
bf16: auto
tf32: true
early_stopping_patience:
resume_from_checkpoint:
Expand Down
4 changes: 2 additions & 2 deletions examples/qwen/lora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,8 @@ learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: false
Expand Down
4 changes: 2 additions & 2 deletions examples/qwen/qlora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,8 @@ learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: false
Expand Down
2 changes: 1 addition & 1 deletion examples/redpajama/config-3b.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ lr_scheduler: cosine
learning_rate: 0.0000002
train_on_inputs: false
group_by_length: false
bf16: true
bf16: auto
tf32: true
early_stopping_patience:
resume_from_checkpoint:
Expand Down
2 changes: 1 addition & 1 deletion examples/replit-3b/config-lora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ lr_scheduler:
learning_rate: 0.00001
train_on_inputs: false
group_by_length: false
bf16: true
bf16: auto
tf32: true
gradient_checkpointing:
early_stopping_patience:
Expand Down
4 changes: 2 additions & 2 deletions examples/tiny-llama/lora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,8 @@ learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
Expand Down
4 changes: 2 additions & 2 deletions examples/tiny-llama/pretrain.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,8 @@ learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
Expand Down
4 changes: 2 additions & 2 deletions examples/tiny-llama/qlora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,8 @@ learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
Expand Down
4 changes: 2 additions & 2 deletions examples/xgen-7b/xgen-7b-8k-qlora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -62,8 +62,8 @@ lr_scheduler: cosine
learning_rate: 0.00002
train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
bf16: auto
fp16:
tf32: false
gradient_checkpointing: true
# stop training after this many evaluation losses have increased in a row
Expand Down
4 changes: 2 additions & 2 deletions examples/yi-34B-chat/qlora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ load_in_8bit: false
load_in_4bit: true
strict: false
sequence_len: 1024
bf16: true
fp16: false
bf16: auto
fp16:
tf32: false
flash_attention: true
special_tokens:
Expand Down
Loading

0 comments on commit 782b6a4

Please sign in to comment.