Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Stable Diffusion #828

Merged
merged 27 commits into from
Sep 25, 2022
Merged

Add Stable Diffusion #828

merged 27 commits into from
Sep 25, 2022

Conversation

fchollet
Copy link
Member

This is a working Stable Diffusion text to image model. It reuses a bunch of code from Divam's original implementation. All top-level models are full rewrites as Functional models.

All in all it's about 600 LOC. We can probably go below we further refactoring. The UNet is too declarative right now, it could be refactored to be a lot more concise.

We need to:

  • Refactor the UNet (DiffusionModel) to be as elegant as possible
  • Figure out what to do with constants.py
  • Upload the new weights files somewhere
  • Further cleanup (remove the temporary test code) and beautify the code
  • Add some basic unit testing

@bhack
Copy link
Contributor

bhack commented Sep 23, 2022

/cc @divamgupta @innat

@LukeWood
Copy link
Contributor

So cool, can't wait to have SD ready and in the package...

@bhack
Copy link
Contributor

bhack commented Sep 23, 2022

As this is the first materialized example where we are handling NLP+VISION I don't know if it is interesting to re-introduce some of our related topics like:

Quoting @chenmoneygithub
keras-team/keras#16181 (comment)

And you are totally correct, we need to agree on a repo for shared components. We think it makes sense to put things into Keras main repo if both KerasCV and KerasNLP would depend on them.

Me:
#52 (comment) #30 (comment)

Copy link
Contributor

@LukeWood LukeWood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor changes

@@ -0,0 +1 @@
from tensorflow_addons.layers import GroupNormalization
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to remove this and duplicate/porting/refactor it in keras-cv or keras repo (#74)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ianstenbit is working on this for now, it’ll live in core Keras - temporarily as a CV internal API

Copy link
Contributor

@tanzhenyu tanzhenyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excited to see this!
A general comment -- are we getting the weights from somewhere, or we will provide a training script later?



def gelu(x):
tanh_res = keras.activations.tanh(x * 0.7978845608 * (1 + 0.044715 * (x**2)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numbers seem pretty magical :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a polynomial approximation of gelu. Generally speaking there are a number of constants harcoded in the code, because the code is non-configurable beyond the arguments of StableDiffusion and text_to_image. Its only function is to load the original weights and match the original numerics.

@LukeWood
Copy link
Contributor

Excited to see this!
A general comment -- are we getting the weights from somewhere, or we will provide a training script later?

As of now they’re ported. If I can setup LAOIN-5B (hoping to in the next few months…) we can retrain; LAOIN-5B was the dataset this model was original trained on.

@LukeWood
Copy link
Contributor

Right now CLIP weight downloading is not lazy:

Screen Shot 2022-09-23 at 11 37 41 PM

Copy link
Contributor

@LukeWood LukeWood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to add an entry to keras_cv/models/init.py!

@LukeWood
Copy link
Contributor

LukeWood commented Sep 24, 2022

Screen Shot 2022-09-23 at 11 56 58 PM

Managed to get it working in a notebook!

@bhack
Copy link
Contributor

bhack commented Sep 24, 2022

Excited to see this!
A general comment -- are we getting the weights from somewhere, or we will provide a training script later?

As of now they’re ported. If I can setup LAOIN-5B (hoping to in the next few months…) we can retrain; LAOIN-5B was the dataset this model was original trained on.

Are we going to target recent OpenCLIP releases?

https://laion.ai/blog/large-openclip/

@@ -0,0 +1,138 @@
import numpy as np
Copy link
Contributor

@bhack bhack Sep 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that we want to handle all these numpy objects and numpy TF experimental wrapped TF ops instead of handling directly Tensor and TF ops?

Copy link
Contributor

@bhack bhack Sep 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am asking this also for coherence:

grep -r "import numpy" * --exclude="*test.py" --exclude-dir="*examples*" --exclude-dir="*test*" --exclude-dir="*benchmarks*"
keras_cv/layers/preprocessing/random_rotation.py:import numpy as np
keras_cv/models/object_detection/__internal__.py:import numpy as np
keras_cv/models/object_detection/retina_net/retina_net.py:import numpy as np

Then also in these few cases it was used only for np.log and np.pi.
(I don't know if there was a real cause where we could not use pi from python math and/or tf.math.log)



class SimpleTokenizer:
def __init__(self, bpe_path: str = default_bpe()):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here's the bug causing your lazy loading to not work:

bpe_path=None

bpe_path = bpe_path or default_bpe()

@YaoYuanxin
Copy link

Hi, I have a fairly basic question. I installed the updated version of keras_cv, but it's showing "no attribute".
image

Is it because StableDiffusion hasn't been officially released for keras_cv yet?

@@ -126,13 +141,3 @@ def get_initial_parameters(self, timesteps, batch_size, seed=None):
(batch_size, self.img_height // 8, self.img_width // 8, 4), seed=seed
)
return noise, alphas, alphas_prev

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to add this to examples/models/stable_diffusion/

right now they’re basically glorified debugging demos; but we could render them someday down the line with proper polish

@fchollet
Copy link
Member Author

I added a docstring, removed the TFA dependency, added a code example, and disabled the golden value test so that it doesn't run on CI (which would be overly expensive -- we don't want to download the weights on CI).

I believe we're ready to go, or pretty close.

@LukeWood
Copy link
Contributor

/gcbrun

@divamgupta
Copy link

There will be a lot of transformer based models released in the future. Should there by separate modules for transformers?

Copy link
Contributor

@LukeWood LukeWood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed the full PR, and this LGTM. I have also pulled locally and played with the implementation; great job on the readability and organization.

@LukeWood
Copy link
Contributor

@fchollet you also need to add "regex" to install_requires in setup.py

@fchollet
Copy link
Member Author

Somehow I'm able to run format.sh and lint.sh locally just fine, yet it fails on CI.

@LukeWood
Copy link
Contributor

/gcbrun

@LukeWood
Copy link
Contributor

Adding a dependency to our GCB cluster; one monent...

@LukeWood
Copy link
Contributor

/gcbrun

@LukeWood
Copy link
Contributor

All checks pass @fchollet ! Thank you for working so swiftly to prepare this port! Merging and rebasing #831 on top.

Excited to have such a powerful offering in our package; and can't wait to show off the performance in the keras.io guide.

@LukeWood LukeWood merged commit e0c4053 into keras-team:master Sep 25, 2022
@bhack bhack mentioned this pull request Sep 25, 2022
3 tasks
ghost pushed a commit to y-vectorfield/keras-cv that referenced this pull request Nov 16, 2023
* Add Stable Diffusion

* Further simplification; add files

* Update imports

* Further minor simplications

* Style fixes

* Further beautification. The code is now 500 LOC excluding the tokenizer and the constants file.

* Readability improvements

* Improve generation loop

* Simplify generation code

* Fix bpe_path

* Add init imports

* Minor style fixes

* Remove unnecessary dependencies and add file headers

* Add test

* Add example

* Add group normalization layer

* Disable test so it doesn't run on CI

* Fix code style

* Update docs.

* Format imports

* Remove unused import

* Add file header

* Add more copyright notices

* Add last copyright notice

* Add regex requirement

* Hopefully last copyright notice
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants