Skip to content

Prompt Tips

Vladimir Mandic edited this page Feb 10, 2024 · 5 revisions

Params

TL;DR: Tweak steps, cfg scale and sampler as results will vary depending on combination of all three

  • Encoder
    Which text tokenizer to use, SD typically uses CLiP, but others can be substituted (BERT, GPTx, etc)
  • Batch Size
    How many images to generate in parallel, limited by your VRAM
  • Batch Count
    How many batches to run sequentially
    So total number of images generated is batch size x batch count
  • Seed
    Initializer for noise generator
    Use same seed to have repeatable results, otherwise use random (-1)
  • CFG Scale (Classifer-Free-Guidance)
    How close should diffusers follow prompt, 0 means none and 30 means exact
    Best results are between 7 (creative) to 13 (realistic), but optimal value depends on your model, prompt, and parameters
  • Width & Height
    SD 1.x was trained on 512x512, SD 2.x on 768x768, SDXL on 1024x1024, derivative fine-tunes may have different resolutions
    So typically don't stray too far from those and instead use upscalers if high resolution is needed
    However, changing aspect ratio can change composition of image (e.g. portrait vs landscape results in close-up vs more wide angle results)
  • Steps
    Directly impacts performance
    How many iterative denoising steps to run, low number can lead to non-converged results (denoising is not complete)
    Sweet-spot depends on chosen sampler and settings, can be as low as 10 and as high as 100
    Higher number of steps tends to increase output quality, except for non-converging (ancestral) samplers like "Euler a" which just keep modifying the picture to no end
    At high step counts, many samplers converge to the same image as other samplers

Prompt Engineering

Know your model: different models were trained on different datasets, some may understand terms other models don't

Main groups

  • Mediums: best starting a prompt with it after specifying artist
    Examples: painting, photograph, drawing, sketch
  • Flavors: best left as separate token at the end of the prompt
    Examples: ray tracing, fine art, black and white, pixiv, artstation
  • Movements: best added to prompt with as keyword
    Examples: pop art, photorealism
  • Artists: best starting a prompt with it
    Examples: greg rutkowski, artgerm, dc comics, picasso

Modifiers

  • Feel: best near the end
    Examples: beautiful, sharp focus, 4k, hdr, high detailed, canon 5d
  • Composition: best at front, but only use if results don't fit
    Examples: 1man, 1woman

Negative Prompt

  • Any keyword can be specified in a negative prompt as well Examples: watermark

Advanced Prompt Modifiers

  • Availability depends on implementation
  • Specify importance of specific words: E.g. using (word:1.2) makes the influence of word stronger, (word:0.8) makes it weaker

Advanced Prompt Modifiers

For original backend only:

  • Alternate between words: [word1|word2] will alternate between word1 and word2 in every denoising step, blending the two concepts
  • Switch words during denoising: [word1:word2:0.3] will use word1 for the first 30% of steps, then change it to word2
  • Force include multiple objects "AND"

Hints

  • Use either artists or movements
    Using both may result in one overpowering the other, or in unexpected outcome
  • Select medium that fits artist
    It helps model a lot to know which medium to use when styling
  • Add action after subject
    Examples: portrait, standing, sitting
  • Moving things to the front of prompt may increase its emphasis
    Example: cartoon drawing of a woman as pixar vs pixar drawing of a woman
  • Use both subject and scene keywords Example: woman on a beach

Example

(composition) (artist) (medium) (subject) (action) (scene) (movement) (flavor) (feel)
1woman greg rutkowski painting of a woman happy front portrait on a beach as photorealism, sharp focus, artstation

Clone this wiki locally