Improving GANs with A Dynamic Discriminator
Authors: Ceyuan Yang, Yujun Shen, Yinghao Xu, Deli Zhao, Bo Dai, Bolei Zhou
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | A comprehensive empirical study confirms that the proposed training strategy, termed as Dynamic D, improves the synthesis performance without incurring any additional computation cost or training objectives. Two capacity adjusting schemes are developed for training GANs under different data regimes: i) given a sufficient amount of training data, the discriminator benefits from a progressively increased learning capacity, and ii) when the training data is limited, gradually decreasing the layer width mitigates the over-fitting issue of the discriminator. Experiments on both 2D and 3D-aware image synthesis tasks conducted on a range of datasets substantiate the generalizability of our Dynamic D as well as its substantial improvement over the baselines. |
| Researcher Affiliation | Collaboration | Ceyuan Yang1,3, Yujun Shen2, Yinghao Xu1 Deli Zhao2 Bo Dai3 Bolei Zhou4 1CUHK 2Ant Group 3Shanghai AI Laboratory 4UCLA |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and models are available at https://genforce.github.io/dynamicd. |
| Open Datasets | Yes | Datasets. In this work, several benchmarks are included to evaluate the proposed Dynamic D from various perspectives. For instance, on FFHQ [29] which includes 70,000 high-resolution face images, we conduct the empirical study and comparison against prior approaches. In order to study the effect of different data regimes, we also follow ADA [30] to randomly sample a subset to set up a limited setting and double the entire dataset via horizontal flip for sufficient data, with all the images well aligned and cropped [33]. In addition, AFHQ-v2 [10] is also used to evaluate our Dynamic D under low-data regime. To be specific, AFHQ-v2 [10] consists of around 5,000 images for dogs, cats and wild life respectively. Moreover, we conduct experiments on three sufficient scene collections i.e., LSUN [58] outdoor church, bridge and bedroom which contains 126K, 818K, and 3M unique images respectively. Besides, we also conduct 3D-aware image synthesis on a synthetic car dataset Carla [12] containing 10,000 images rendered from 16 different car models. |
| Dataset Splits | No | The paper mentions using "entire training set" for FID calculation and also mentions specific dataset sizes (e.g., 70,000 high-resolution face images for FFHQ), but it does not provide explicit details on how these datasets are partitioned into training, validation, and testing sets, nor does it specify exact percentages or sample counts for a validation split. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used, such as GPU models, CPU types, or memory specifications. It generally discusses training GANs, which implies computational resources, but no explicit hardware information is given. |
| Software Dependencies | No | The paper does not list specific version numbers for any software dependencies, libraries, or frameworks used in the experiments. |
| Experiment Setup | Yes | In practice, we start with the half capacity of a standard discriminator and ensure the ending capacity is identical to the original one for a fair comparison. In terms of increasing strategy, the extending coefficient α varies from 0.5 to 0.0 such that the discriminator could be changed from the half to the full capacity. Meanwhile, we also decrease the capacity in turn via the shrinking coefficient β. To be specific, the shrinking coefficient β could start at 1.0 and then gradually goes down to 0.5. As training goes by, α linearly goes up every n iterations i.e., the capacity of all layers in discriminator grows up simultaneously (n = 1 in practice). |