Latent Diffusion
Latent diffusion performs the diffusion process in compressed latent space instead of pixel space – 10-100x faster with comparable quality.
Latent diffusion compresses images into a latent space before denoising – makes image generation 10-100x faster and enables Stable Diffusion on consumer GPUs.
Explanation
A VAE encoder compresses images (e.g., 512×512 → 64×64 latent). Diffusion operates in latent space. A VAE decoder reconstructs the final image. This architecture makes Stable Diffusion, DALL-E, and Flux possible on consumer hardware.
Marketing Relevance
Latent diffusion is the key innovation that democratized image generation – without it, text-to-image would be limited to supercomputers.
Example
Stable Diffusion compresses a 512×512 image to 64×64 latent, denoises there in 20-50 steps, and decodes back – instead of working directly in 512×512.
Common Pitfalls
VAE decoder can lose fine details. Latent space has finite capacity. VAE training strongly influences final quality.
Origin & History
Rombach, Blattmann et al. (LMU Munich/Stability AI) published "High-Resolution Image Synthesis with Latent Diffusion Models" in December 2021. The paper combined VAEs with diffusion, enabling high-resolution image generation on a single GPU for the first time. Stable Diffusion (August 2022) is directly based on this architecture.
Comparisons & Differences
Latent Diffusion vs. Pixel-Space Diffusion
Latent diffusion operates in compressed space (fast, efficient); pixel-space diffusion directly on pixels (slow, quality comparable).
Latent Diffusion vs. VAE
VAE is a component of latent diffusion (the encoder/decoder); latent diffusion is the complete system with diffusion in latent space.