We examine the unreasonable effectiveness of classifier-free steering (CFG).
CFG is the dominant technique of conditional sampling for text-to-image diffusion fashions, but
in contrast to different points of diffusion, it stays on shaky theoretical footing. On this paper, we disprove frequent misconceptions, by displaying that CFG interacts in another way with DDPM and DDIM, and neither sampler with CFG generates the gamma-powered distribution.
Then, we make clear the habits of CFG by displaying that it’s a sort of Predictor-Corrector (PC) technique that alternates between denoising and sharpening, which we name Predictor-Corrector Steering (PCG).
We present that within the SDE restrict, DDPM-CFG is equal to PCG
with a DDIM predictor utilized to the conditional distribution, and Langevin dynamics corrector utilized to a gamma-powered distribution. Whereas the usual PC corrector applies to the conditional distribution and improves sampling accuracy, our corrector sharpens the distribution.