Large-scale GAN for text-to-image synthesis

About GigaGAN

GigaGAN is a novel architecture that far exceeds the previous limits of GAN producing ultra HD images.

With 1 billion parameters, GigaGAN is achieving lower FID than Stable Diffusion v1.5, DALL·E 2, and Parti-750M. It generates 512px outputs at 0.13s, orders of magnitude faster than diffusion and autoregressive models, and inherits the disentangled, continuous, and controllable latent space of GANs. We also train a fast upsampler that can generate 4K images from the low-res outputs of text-to-image models.


  • ✅ Authors: POSTECH + CMU + Adobe 🚀
  • ✅ GAN-based billion-scale model on billions pictures
  • ✅ 36× larger than StyleGAN, 6× than StyleGAN-XL
  • ✅ Text-conditioned GAN-upsampling >> DALLE
  • ✅ Ultra HD images at 4k resolution in 3.66 secs

Ready to start building?

At Apideck we're building the world's biggest API network. Discover and integrate over 12,000 APIs.

Check out the API Tracker