add basic support for the optimi adamw optimizer #1727

winglian · 2024-07-05T13:31:03Z

https://optimi.benjaminwarner.dev/kahan_summation/

Kahan Summation¶

Kahan summation 2 is a technique to reduce the numerical error of adding multiple low precision numbers by accumulating errors in a separate compensation buffer. The addition of the compensation buffer increases the effective summation precision by the precision of the compensation buffer.

Using Kahan summation to improve low precision model training was first introduced by Zamirai et al in Revisiting BFloat16 Training. Zamirai et al discovered the primary source of numerical error from low precision training is during the optimizer’s model weight update step. They add Kahan summation to the SGD & AdamW weight update steps to reduce the update’s numerical inaccuracy, increasing low precision training to the equivalent of full precision training across tested models.

winglian force-pushed the optimi-optimizer branch 2 times, most recently from efba05e to dfb58f9 Compare July 9, 2024 18:55

winglian added 5 commits July 12, 2024 21:24

add support for optimi_adamw optimizer w kahan summation

4b04cd4

pydantic validator for optimi_adamw

785731f

workaround for setting optimizer for fsdp

ba2ed0a

make sure to install optimizer packages

99c1985

make sure to have parity for model parameters passed to optimizer

5c36692

winglian force-pushed the optimi-optimizer branch from dfb58f9 to 5c36692 Compare July 13, 2024 01:24

winglian added the ready to merge label Jul 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add basic support for the optimi adamw optimizer #1727

add basic support for the optimi adamw optimizer #1727

winglian commented Jul 5, 2024 •

edited

Loading

add basic support for the optimi adamw optimizer #1727

Are you sure you want to change the base?

add basic support for the optimi adamw optimizer #1727

Conversation

winglian commented Jul 5, 2024 • edited Loading

winglian commented Jul 5, 2024 •

edited

Loading