Understanding Over-parameterization in Generative Adversarial Networks
Authors: Yogesh Balaji, Mohammadmahdi Sajedi, Neha Mukund Kalibhat, Mucong Ding, Dominik Stöger, Mahdi Soltanolkotabi, Soheil Feizi
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we present a comprehensive analysis of the importance of model overparameterization in GANs both theoretically and empirically. We theoretically show that in an overparameterized GAN model with a 1-layer neural network generator and a linear discriminator, GDA converges to a global saddle point of the underlying non-convex concave min-max problem. We also empirically study the role of model overparameterization in GANs using several large-scale experiments on CIFAR-10 and Celeb-A datasets. |
| Researcher Affiliation | Academia | Yogesh Balaji1 , Mohammadmahdi Sajedi2 , Neha Mukund Kalibhat1, Mucong Ding1, Dominik St oger2, Mahdi Soltanolkotabi2, Soheil Feizi1 1 University of Maryland, College Park, MD 2 University of Southern California, Los Angeles, CA |
| Pseudocode | No | The paper does not include pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or link to the open-source code for the methodology described. |
| Open Datasets | Yes | In this section, we demonstrate benefits of overparamterization in large GAN models. In particular, we train GANs on two benchmark datasets: CIFAR-10 (32 32 resolution) and Celeb-A (64 64 resolution). |
| Dataset Splits | No | The paper mentions using a "held-out validation set" for FID scores but does not specify the exact percentages or sample counts for training/validation/test splits needed to reproduce the data partitioning for model training. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions using "Adam" as an optimizer but does not specify version numbers for any software components, libraries, or frameworks used in the experiments. |
| Experiment Setup | Yes | Both DCGAN and Resnet-based GAN models are optimized using the commonly used hyper-parameters: Adam with learning rate 0.0001 and betas (0.5, 0.999) for DCGAN, gradient penalty of 10 and 5 critic iterations per generator s iteration for both DCGAN and Resnet-based GAN models. Models are trained for 300, 000 iterations with a batch size of 64. |