Large-scale GAN for text-to-image synthesis
GigaGAN is a novel architecture that far exceeds the previous limits of GAN producing ultra HD images.
With 1 billion parameters, GigaGAN is achieving lower FID than Stable Diffusion v1.5, DALL·E 2, and Parti-750M. It generates 512px outputs at 0.13s, orders of magnitude faster than diffusion and autoregressive models, and inherits the disentangled, continuous, and controllable latent space of GANs. We also train a fast upsampler that can generate 4K images from the low-res outputs of text-to-image models.
- ✅ Authors: POSTECH + CMU + Adobe 🚀
- ✅ GAN-based billion-scale model on billions pictures
- ✅ 36× larger than StyleGAN, 6× than StyleGAN-XL
- ✅ Text-conditioned GAN-upsampling >> DALLE
- ✅ Ultra HD images at 4k resolution in 3.66 secs