Table of Contents

Note that this wiki page liberally reinterprets things I refer to, and may unintentionally misunderstand or misrepresent them.

To Do: add a links and resources section, for example to things linked to by the How To Geek article.

Stable Diffusion

Stable Diffusion is an open source text to image synthesis AI that can take any natural language English prompt and create compelling photorealistic, artistic, or otherwise imaginative results. It uses a synthesis technique called Latent Diffusion, as do at least a few other image synthesizers. It gets its own first level heading here because it is free and open source.

Here is a demo page, suitable for fun in toying with weird or cool ideas.

Prompt and parameter engineering, and various techniques afforded by Stable Diffusion, can turn out astonishing results, insofar that the quality of images have implications for a sea change in visual arts industries, and raise concerns about authorship, plagiarism, propaganda, crime, (mis)representation of truth or reality itself, censorship, and displacement or obsoletion of working visual artists.

I only mention those concerns to emphasize how impactful AI image synthesis has become, but I make no comments about those concerns. Here I'm after how to make great images using AI. Incidentally, some of my AI art, often including the prompt that made it in displayed metadata, is over here.

AUTOMATIC1111 stable-diffusion-webui

The Stable Diffusion WebUI automates installation and running of Stable Diffusion on desktop on many computer platforms, provided you have a graphics card that can run it. An NVIDIA CUDA card with minimum 4GB VRAM may be required for any or all of it.

Stable Diffusion General Notes and Usage

Fundamental parameters that Stable Diffusion uses for image synthesis, and their attributes:

Samplers (Sample Methods)

Re: a breakdown of sampler methods at Reddit (there are other useful Stable Diffusion technique comments as well.)

Note: These notes don't explore sampler methods that were added to the webui after I experimented with and explored. Those additional samplers are: LMS Karras, DPM2 Karras, DPM2 a Karras

All available samplers in the webui are: Euler a, Euler, LMS, Heun, DPM2, DPM2 a, DPM fast, DPM adaptive, DDIM, PLMS

Key to these; also not that where more than one of these are listed with a bullet point, the subsequent ones were designed as improvements:

General note: non-ancestral samplers (ones that don't have a in their name) may generally produce very similar images from the same seed.

Samplers and their history and observed characteristics:

“Weirdly, in some comparisons DPM2-A generates very similar images as Euler-A… on the previous seed. Might be due to it being a second-order method vs first-order, might be an experiment muck-up.”

Sampler steps operates with samplers to add noise back into an image on each iteration (I think).

Prompt Engineering

References:

While an effective overall goal may be to be as specific and detailed as possible, depending on visual feedback you get, you may find that it's more effective to be general and allow the AI to interpret and do things you don't write explicitly. To that end:

Prompt Weighting, Modifiers and Experiments

Weighting

Many Stable Diffusion interfaces allow expressing a percent weight for any token (idea or part of a prompt).

(Note: temporary edit, pending experiments to confirm: I may have details wrong here. Here is a reference on attention control in stable-diffusion-webui.)

Here are ways that can be done:

encaustic:0.1

Or if it's a phrase, surround that with quotes:

"colorful encaustic:0.5"

Purportedly, if you use multiple percent weights, they should add up to less than 1. Supposedly any percent you don't use, Stable Diffusion internally distributes among the remainder of unweighted tokens.

You can also emphasize or de-emphasize tokens with syntax:

Other Experimental Modifiers

stable-diffusion-webui supports a logical style blend operator, "AND". A phrase that includes this will cause Stable Diffusion to try to do a stylistic and/or object blend of the tokens on the left and right of AND. For example:

Findings from my own experiments:

Seed

Because the same seed will always produce the same noise, for sampling methods that don't radically alter the noise between steps (like any sampler labeled a (ancestral)), if you use the same seed and tweak other things like iterations and the prompt, you can tweak results. Also, it seems that some seeds tend to just produce better results for some types of prompts.

Variant seed and variant seed strength parameters allow you to blend one seed with another. The variant seed strength is a percent expressed as decimal: 0 means don't use interpolate toward the variant seed at all, .05 means interpolate halfway, 1 means use the variant seed and not the base seed.

Variant seed and var.strength parameters can give subtle changes if you change the value of the latter a little bit.

The Seed Travel script does exactly that: it sets a second seed as variant, and increments the var.strength over time from 0 to 1. “Samplers that work well are: Euler, LMS, Heun, DPM2 & DDIM.” Specifically, if you use ancestral samplers (that start with a), it won't work. You'll get some animation but it will abruptly jump between variations.

Inpainting

Inpainting allows you to change part of an image using a mask and text to image synthesis prompt.

Forgiving the shameless (though ultimately PG-rated) juvenile male gaze in this demo video, it's clear that (at this writing) the Stable Diffusion 1.5 model devoted to inpainting is superior.

Image to Image

The same video demonstrates that model is superior for AI generating an image based on another image.

AI image upscaling

My review of upscalers for purposes of upscaling abstract art that is like marbled watery acrylic + watercolor:

Potentially good enough by themselves:

Possibly useful in combination or layering / processing with the above:

Couldn't try LDSR as I get a cert error on trying. Is BSRGAN new since I wrote this? Memory allocation error on trying to use it.

From toying with Stable Diffusion upscaling settings, I get really cool details with Sampling Steps 8, CFG Scale 13, Denoising Strength 0.46. But Denoising Strength isn't accessible in regular Stable Diffusion use I think? Also, this is doing derived (upscaled) images.

References and Resources

Comments: That's an amazing work of reference and generated images. Also, the claim of “all” is probably extremely far-fetched. With a bit of searching and off the top of my head I come up with these prompts, for styles of artists not on that list, which produce images representative of their technique and style:

If it's even possible to define or know “all” artists represented in Stable Diffusion, my guess is that list doesn't even remotely approach “all.”