Skip to content

PictureColorDiffusion is a program that automate 2d colorization of grayscale drawings using Automatic111 Stable Diffusion's WebUI API, it's interrogation feature and the controlnet extension. Additional features such as YoloV8 segmentation are also available.

License

Notifications You must be signed in to change notification settings

kitsumed/PictureColorDiffusion

Repository files navigation

Story

PictureColorDiffusion was born after multiple attempts to color 2D grayscale images using and editing other open source projects, mainly manga and comics (such as GAN models with LAB channels). After poor results, I tried using Stable Diffusion's img2img generation, controlnet quickly joined the process and I switched to txt2img generation. Once I've found settings that generally worked well between models, I decided to automate the generation with a application, as well as adding some additional options on the application side to improve the end result.

What is PictureColorDiffusion?

PictureColorDiffusion is a program that automate 2d colorization of drawings / manga / comics using Stable Diffusion's WebUI API, it's interrogation feature, the controlnet extension and other features on the application side.

Requirements

  • AUTOMATIC1111 Stable Diffusion WebUI (Can be run locally or remotly, like on google colabs)
    • Need to be run with the --api argument.
  • ControlNet extension for the Stable Diffusion WebUI
  • A SD / SDXL model related to 2D drawing or anime, preferably trained on danbooru tags, like AOM3.
    • This model need to be put into the models\Stable-Diffusion directory of the AUTOMATIC1111 Stable Diffusion WebUI.
    • A VAE model if there isn't one baked into the SD / SDXL model, for SD1.x based model like AOM3, stabilityai mse-840000-ema VAE seems to give good results. The VAE model need to be put into the models\VAE directory of the AUTOMATIC1111 Stable Diffusion WebUI.

Tip

You can bypass the Stable Diffusion API Endpoint verification in the application with the shortcut Ctrl+Shift+B, keep in mind that some issues will arise if you do so. The colorization of images won't work, but you will be able to try your YoloV8 model by right clicking on the inference button.

Installation

For AUTOMATIC1111 Stable Diffusion WebUI installation, please read their own Wiki.

For the configuration of the ControlNet Extension, please read their own Wiki.

You can download the latest build of PictureColorDiffusion by clicking here. To run the application, unzip the release.zip file and execute PictureColorDiffusion.exe.

Application Features

This is a list of feature implemented directly in PictureColorDiffusion.

  • Dynamic resizing of image size depending of the selected mode.
  • Interrogation model (deepdanbooru) filter for bad words.
  • YoloV8 image segmentation

    Perform image segmentation on the input picture with a YoloV8 onnx model to keep parts of the original image in the output image. I've created an example model for detecting speech bubbles available on huggingface.

Note

The application does not offer the possibility of targeting specific classes from a YoloV8 model during image segmentation.

FAQ / Questions

Where to store YoloV8 models?

All YoloV8 models must be placed in the models directory, located in the same directory as the executable. Only onnx model are supported.

What SD / SDXL model to use in the webui?

I tried to make every modes of the application have somewhat good results with popular 2d/anime related models from huggingface and civitai. In the end, I realised that the results seems to depends of the following:

  • Has the SD / SDXL model been trained on colored images resembling your grayscale image & prompt?
    • Example: A grayscale comic with manga mode, but the model does not know manga related words well enough.
  • Did the PictureColorDiffusion mode you selectioned matches your grayscale image?
    • Example: Manga mode for a drawing could cause poor results and turn the drawing into a manga like image.

There are some work arounds, you could train a Lora with colored images of what you want specifically for your model, then use the Lora using the additional prompt section of the application (Format: <lora:LORA_NAME_HERE:WEIGHT_HERE>). You can also use the additional prompt & negative prompt section to add informations on what you are trying to colorize. Keeping the Use interrogation feature enabled can also help, as it's automatically adding additional information on what you are trying to colorize.

My generated image is completly different from the input image.

If your generated image is completely different from your input image, it means ControlNet probably wasn't used for the generation. You can easily check this by opening your web UI console and searching for errors.

The error typically ends with Exception: ControlNet model [MODEL-NAME](StableDiffusionVersion.SDXL) is not compatible with sd model(StableDiffusionVersion.SD1x) or something similar. In this example, the web UI is indicating that you are using a Stable Diffusion SD1.x model (with sd model(StableDiffusionVersion.SD1x)), but that the ControlNet model you selected was made for SDXL (ControlNet model [MODEL-NAME](StableDiffusionVersion.SDXL)). Thus, Controlnet failed to load, and the web UI continued without ControlNet, generating a picture completely different from the input image. You need to make sure that your Controlnet model support the same version as your StableDiffusion (SD) model.

Why doesn't my generated image resemble the original when using SDXL Pony based models?

I’m not sure of the exact cause of this issue, but I’ve concluded that some community-made ControlNet models are more compatible with SDXL Pony-based models than others. During my tests, the best results I achieved were with MistoLine 1.

Why do objects detected by YOLOv8 occasionally show up duplicated in the output results when using an SDXL mode?

I encountered this issue while testing the MangaXL mode with my YOLOv8 model for speech bubble segmentation. I concluded that the combination of certain SDXL models and ControlNet models causes ControlNet to attempt to recreate the object in an incorrect position. For reference, this issue often occurred with bdsqlsz models but rarely happened with MistoLine 1. So using a different ControlNet model that is compatible with SDXL should resolve the issue.

Footnotes

  1. MistoLine is an SDXL-ControlNet model that can adapt to any type of line art input, this mean that the MistoLine model can be used for multiples modules (anime_denoise, canny, etc.). From some quick tests, I've found that this model tends to produce outputs that are closer to the original image compared to others. It also performs slightly better with SDXL Pony based models. 2 3

About

PictureColorDiffusion is a program that automate 2d colorization of grayscale drawings using Automatic111 Stable Diffusion's WebUI API, it's interrogation feature and the controlnet extension. Additional features such as YoloV8 segmentation are also available.

Topics

Resources

License

Stars

Watchers

Forks

Languages