Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
xiangning-chen committed Apr 26, 2023
1 parent ab8cf0e commit 5dd35ff
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions lion/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@

This repository contains JAX, TensorFlow and PyTorch implementations of the Lion optimizer discovered by symbolic program search in the [Symbolic Discovery of Optimization Algorithms](https://arxiv.org/abs/2302.06675) paper.

Lion is also successfully deployed in production systems such as Google’s search ads CTR model.
<!-- Lion is also successfully deployed in production systems such as Google’s search ads CTR model. -->

Lion is available on multiple codebases, including [Praxis](https://github.com/google/praxis), [Optax](https://github.com/deepmind/optax), [Keras](https://github.com/keras-team/keras/blob/901950201d867c85ec34f4d0c9201aea2c15a65d/keras/optimizers/lion.py), [Timm](https://github.com/huggingface/pytorch-image-models/blob/main/timm/optim/lion.py), and a popular [PyTorch implementation by lucidtrains](https://github.com/lucidrains/lion-pytorch).
Lion is available on multiple codebases, including [Praxis](https://github.com/google/praxis), [Optax](https://github.com/deepmind/optax), [Keras](https://github.com/keras-team/keras/blob/901950201d867c85ec34f4d0c9201aea2c15a65d/keras/optimizers/lion.py), [Timm](https://github.com/huggingface/pytorch-image-models/blob/main/timm/optim/lion.py), [T5X](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/examples/base_wmt_from_scratch_lion.gin), and a popular [PyTorch implementation by lucidtrains](https://github.com/lucidrains/lion-pytorch).

## Table of Contents

Expand Down Expand Up @@ -85,8 +85,8 @@ Additionally, the $\epsilon$ in AdamW is set as $1e-6$ instead of the default $1

- The update generated by Lion is an element-wise binary $\pm 1$, as a result of the sign operation, therefore it has a larger norm than those generated by other optimizers.
Based on our experience, `a suitable learning rate for Lion is typically 3-10x smaller than that for AdamW.`
Note that the initial value, peak value, and end value in the learning rate schedule should be changed `simultaneously` with the same ratio compared to AdamW.
We `do not` modify other training settings such as learning rate schedule, gradient and update clipping.
Note that the initial value, peak value, and end value of the learning rate should be changed `simultaneously` with the same ratio compared to AdamW.
We `do not` modify other training settings such as the learning rate schedule, gradient and update clipping.
Since the effective weight decay is $lr * \lambda$, `the value of $\lambda$ used for Lion is 3-10x larger than that for AdamW in order to maintain a similar strength.`
For instance,
- $lr=1e-4$, $\lambda=10.0$ in Lion and $lr=1e-3$, $\lambda=1.0$ in AdamW when training ViT-B/16 on ImageNet with strong augmentations,
Expand Down

0 comments on commit 5dd35ff

Please sign in to comment.