Text-to-3D using 2D Diffusion
Given a caption, DreamFusion uses a text-to-image generative model called Imagen to optimize a 3D scene. They propose Score Distillation Sampling (SDS), a way to generate samples from a diffusion model by optimizing a loss function. SDS allows us to optimize samples in an arbitrary parameter space, such as a 3D space, as long as They can map back to images differentiably. They use a 3D scene parameterization similar to Neural Radiance Fields, or NeRFs, to define this differentiable mapping. SDS alone produces reasonable scene appearance, but DreamFusion adds additional regularizers and optimization strategies to improve geometry. The resulting trained NeRFs are coherent, with high-quality normals, surface geometry and depth, and are relightable with a Lambertian shading model.