Lion 8 bit #188

lucidrains · 2023-03-09T16:09:29Z

per advice from Tim, the plan will be to closely follow the 1-state logic of RMSProp and conditionally branch out for the lion logic

status - Lion8bit is successfully training a small autoregressive transformer for character level enwik8 on my machine

some remaining todos before merging

in pytorch

def update_fn(p, grad, exp_avg, lr, wd, beta1, beta2):
    # stepweight decay

    p.data.mul_(1 - lr * wd)

    # weight update

    update = exp_avg.clone().mul_(beta1).add(grad, alpha = 1 - beta1).sign_()
    p.add_(update, alpha = -lr)

    # decay the momentum running average coefficient

    exp_avg.mul_(beta2).add_(grad, alpha = 1 - beta2)

to test

$ git clone https://github.com/lucidrains/bitsandbytes
$ cd bitsandbytes
# follow the make from source instructions above

add tests
add some comments
figure out why tests break and make it pass
consider just copy pasting lion into the repo

…u for all functions

lucidrains · 2023-03-09T20:35:57Z

autoregressive enwik8 with lion8bit converged successfully (sans weight decay), even though there's some inaccuracies in kernel.cu

lucidrains · 2023-03-09T21:20:44Z

it may not be hitting the 8bit1state blockwise path though, where state is being smoothed before the update

lucidrains · 2023-03-09T22:37:15Z

weight decay looks good

christallire · 2023-03-10T06:20:00Z

resolves #150

lucidrains · 2023-03-10T15:54:47Z

@christallire welcome you to build from source and import the Lion8bit and give it a try

seems to be working great, thanks to Tim's surrounding scaffold

…the python class (so that the state dict still makes sense)

lucidrains · 2023-03-10T18:19:21Z

ok, will let this PR sit for a while to gather feedback. probably will revisit it at end of this month and see if we can get it merged

lucidrains · 2023-03-18T17:20:24Z

i think there may still be an issue with the blockwise version, where the momentum update with beta2 is occuring before the actual parameter updates

TimDettmers

Thank you for your work on this! This looks almost good to me. The only thing that is off is the order of the updates that you point out. We can get around this without rewriting the CUDA code by using the gradient variable as temporary variable to store the current momentum value + the update value for Lion (see other comments). Please add a comment to these lines to help people understand the use of the gradient as a temporary variable (adding more comments is appreciated, sorry for the mess with the undocumented code).

The enwiki test runs are great! I think this shows that it works. It would be great to add a test in test_optim.py for Lion. You need three things for that: (1) define lion str2optimizer dictionary entry, (2) define str2statenames dictionary entry, (3) add the optimizer name to the optimizer_names list for the test_optimzier8bit function. That should be all that is needed. The only question is which 32-bit baseline to use. You might want to use your own Lion 32-bit repo. I think a fair test could also be to compare against the 32-bit bnb optimizer (since you have shown that 8-bit already replicates enwiki performance). This might be simpler and quick to write up.

csrc/kernels.cu

lucidrains · 2023-03-22T01:30:20Z

@TimDettmers thank you Tim for the feedback! 🙏 will address everything you brought up and poke you when it is ready later this week

…wise 8 bit

TimDettmers · 2023-04-11T14:22:40Z

Thank you, this looks good to me. The issue with the test is expected. The error from Lion is expected to be higher at times due to its noisy update. I will merge and will have a look at the test. This is an excellent PR, thank you for all the work. I think it will be invaluable to the community!

lucidrains · 2024-05-01T19:05:07Z

@TimDettmers oh hey Tim! glad to see this merged

thank you for reviewing it and getting it out there! totally forgot about it, my bad

initial commit, slowly work from interface into the kernel

7247cb4

lucidrains marked this pull request as draft March 9, 2023 16:09

lucidrains changed the title ~~initial commit, slowly work from interface into the kernel~~ (wip) Lion 8 bit Mar 9, 2023

lucidrains mentioned this pull request Mar 9, 2023

add an 8-bit version with bitsandbytes lucidrains/lion-pytorch#17

Open

lucidrains added 3 commits March 9, 2023 09:45

make sure interface is correct

d43ea97

do a bunch of typical bookkeeping before getting to main lion logic

cb4c3c8

forget about tests for now, will test live on local enwik8 training

8de29fc

lucidrains changed the title ~~(wip) Lion 8 bit~~ Lion 8 bit Mar 9, 2023

lucidrains added 3 commits March 9, 2023 11:10

add a sign function, for lion

64bb1ae

use epsilon as beta2 for lion, complete most of the logic in kernel.c…

c83888a

…u for all functions

remove something rmsprop specific

ead570a

lucidrains marked this pull request as ready for review March 9, 2023 20:35

lucidrains mentioned this pull request Mar 9, 2023

Lion optimizer #150

Closed

lucidrains added 2 commits March 9, 2023 14:03

fix weight decay for lion to be decoupled, using a switch

af03430

missed adagrad

c558272

lucidrains added 3 commits March 10, 2023 08:39

swap the order in which momentum and parameters are updated in ops.cu

8618bed

do the epsilon beta2 switcharoo within the cuda code, and not within …

c99b44f

…the python class (so that the state dict still makes sense)

whoops

19b9ef3

lucidrains added 3 commits March 10, 2023 12:50

beta2 is actually accessible in kOptimizerStatic8bit1StateBlockwise

abbe65a

always pass beta2 into all the 1state functions

6c377b3

switch all eps to beta2

369a51c

lucidrains mentioned this pull request Mar 13, 2023

[WIP] Testing the lion optimizer mlfoundations/open_clip#432

Draft

TimDettmers requested changes Mar 21, 2023

View reviewed changes

csrc/kernels.cu Show resolved Hide resolved

lucidrains added 6 commits March 22, 2023 07:52

follow advice of Tim to fix update of momentum vs parameters in block…

9b656f4

…wise 8 bit

add some code in test_optim.py, although it seems to be failing

a43cd20

add some comments, and fix use of g_val

aa9b939

fix consistent tabs / spaces

916000c

another tab/spaces fix

978ba2d

fix comment

2a6828e

TimDettmers merged commit b0ec20c into bitsandbytes-foundation:main Apr 11, 2023

sdbds mentioned this pull request Apr 24, 2023

[Feature]Support for Lion8bit kohya-ss/sd-scripts#443

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lion 8 bit #188

Lion 8 bit #188

lucidrains commented Mar 9, 2023 •

edited

Loading

lucidrains commented Mar 9, 2023 •

edited

Loading

lucidrains commented Mar 9, 2023 •

edited

Loading

lucidrains commented Mar 9, 2023

christallire commented Mar 10, 2023

lucidrains commented Mar 10, 2023

lucidrains commented Mar 10, 2023

lucidrains commented Mar 18, 2023

TimDettmers left a comment

lucidrains commented Mar 22, 2023

TimDettmers commented Apr 11, 2023

lucidrains commented May 1, 2024 •

edited

Loading

Lion 8 bit #188

Lion 8 bit #188

Conversation

lucidrains commented Mar 9, 2023 • edited Loading

lucidrains commented Mar 9, 2023 • edited Loading

lucidrains commented Mar 9, 2023 • edited Loading

lucidrains commented Mar 9, 2023

christallire commented Mar 10, 2023

lucidrains commented Mar 10, 2023

lucidrains commented Mar 10, 2023

lucidrains commented Mar 18, 2023

TimDettmers left a comment

Choose a reason for hiding this comment

lucidrains commented Mar 22, 2023

TimDettmers commented Apr 11, 2023

lucidrains commented May 1, 2024 • edited Loading

lucidrains commented Mar 9, 2023 •

edited

Loading

lucidrains commented Mar 9, 2023 •

edited

Loading

lucidrains commented Mar 9, 2023 •

edited

Loading

lucidrains commented May 1, 2024 •

edited

Loading