Local Convergence Analysis of Gradient Descent Ascent with Finite Timescale Separation

Authors: Tanner Fiez, Lillian J Ratliff

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we extend the results to gradient penalty regularization methods for generative adversarial networks and empirically demonstrate on CIFAR-10 and Celeb A the significant impact timescale separation has on training performance. (Abstract) ... We now present numerical experiments and Appendix K contains further simulations and details.
Researcher Affiliation Academia Tanner Fiez & Lillian J. Ratliff Department of Electrical and Computer Engineering University Washington Seattle, WA 98195 {fiezt, ratliffl}@uw.edu
Pseudocode No The paper describes mathematical formulations and algorithms (e.g., 'The τ-GDA dynamics... are given by xk+1 = xk γ1Λτg(xk).') but does not include any structured pseudocode blocks or algorithm listings.
Open Source Code Yes Code for the experiments is included in the supplemental material.
Open Datasets Yes Finally, we extend the results to gradient penalty regularization methods for generative adversarial networks and empirically demonstrate on CIFAR-10 and Celeb A the significant impact timescale separation has on training performance. (Abstract)
Dataset Splits No The paper uses standard datasets like CIFAR-10 and Celeb A for experiments and mentions 'training process' and 'evaluation' but does not specify how the datasets were formally split into training, validation, and test sets. It does not provide explicit percentages, sample counts, or references to predefined splits for these purposes.
Hardware Specification No The paper mentions that 'The experiments are computationally intensive' but does not provide any specific details regarding the hardware used, such as GPU models, CPU specifications, or memory.
Software Dependencies No The paper mentions using RMSprop ('We run RMSprop with parameter α = 0.99') and a PyTorch-based FID implementation ('We used the FID score implementation in pytorch available at https://github.com/mseitzer/pytorch-fid'). However, it does not specify version numbers for PyTorch, RMSprop, or any other software dependencies, which are necessary for full reproducibility.
Experiment Setup Yes We fix the initial learning rate for the generator to be γ1 = 0.0001 with CIFAR-10 and γ1 = 0.00005 for Celeb A. The learning rates are decayed so that γ1,k = γ1/(1 + ν)k and γ2,k = τγ1,k are the generator and discriminator learning rates at update k where ν = 0.005. The batch size is 64, the latent data is drawn from a standard normal of dimension 256, and the resolution of the images is 32 32 3. We run RMSprop with parameter α = 0.99 and retain an exponential moving average of the generator parameters for evaluation with parameter β = 0.9999. (Section 5) ... In Figure 18 we include the hyperparameters that were selected. (Appendix K.7.2)