Update README.md

google · Apr 26, 2023 · 5dd35ff · 5dd35ff
1 parent ab8cf0e
commit 5dd35ff
Showing 1 changed file with 4 additions and 4 deletions.
diff --git a/lion/README.md b/lion/README.md
@@ -3,9 +3,9 @@
 
 This repository contains JAX, TensorFlow and PyTorch implementations of the Lion optimizer discovered by symbolic program search in the [Symbolic Discovery of Optimization Algorithms](https://arxiv.org/abs/2302.06675) paper. 
 
-Lion is also successfully deployed in production systems such as Google’s search ads CTR model.
+<!-- Lion is also successfully deployed in production systems such as Google’s search ads CTR model. -->
 
-Lion is available on multiple codebases, including [Praxis](https://github.com/google/praxis), [Optax](https://github.com/deepmind/optax), [Keras](https://github.com/keras-team/keras/blob/901950201d867c85ec34f4d0c9201aea2c15a65d/keras/optimizers/lion.py), [Timm](https://github.com/huggingface/pytorch-image-models/blob/main/timm/optim/lion.py), and a popular [PyTorch implementation by lucidtrains](https://github.com/lucidrains/lion-pytorch).
+Lion is available on multiple codebases, including [Praxis](https://github.com/google/praxis), [Optax](https://github.com/deepmind/optax), [Keras](https://github.com/keras-team/keras/blob/901950201d867c85ec34f4d0c9201aea2c15a65d/keras/optimizers/lion.py), [Timm](https://github.com/huggingface/pytorch-image-models/blob/main/timm/optim/lion.py), [T5X](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/examples/base_wmt_from_scratch_lion.gin), and a popular [PyTorch implementation by lucidtrains](https://github.com/lucidrains/lion-pytorch).
 
 ## Table of Contents
 
@@ -85,8 +85,8 @@ Additionally, the $\epsilon$ in AdamW is set as $1e-6$ instead of the default $1
 
 - The update generated by Lion is an element-wise binary $\pm 1$, as a result of the sign operation, therefore it has a larger norm than those generated by other optimizers.
 Based on our experience, `a suitable learning rate for Lion is typically 3-10x smaller than that for AdamW.`
-Note that the initial value, peak value, and end value in the learning rate schedule should be changed `simultaneously` with the same ratio compared to AdamW.
-We `do not` modify other training settings such as learning rate schedule, gradient and update clipping. 
+Note that the initial value, peak value, and end value of the learning rate should be changed `simultaneously` with the same ratio compared to AdamW.
+We `do not` modify other training settings such as the learning rate schedule, gradient and update clipping.
 Since the effective weight decay is $lr * \lambda$, `the value of $\lambda$ used for Lion is 3-10x larger than that for AdamW in order to maintain a similar strength.`
 For instance, 
     - $lr=1e-4$, $\lambda=10.0$ in Lion and $lr=1e-3$, $\lambda=1.0$ in AdamW when training ViT-B/16 on ImageNet with strong augmentations,