Local Convergence Analysis of Gradient Descent Ascent with Finite Timescale Separation
Authors: Tanner Fiez, Lillian J Ratliff
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we extend the results to gradient penalty regularization methods for generative adversarial networks and empirically demonstrate on CIFAR-10 and Celeb A the significant impact timescale separation has on training performance. (Abstract) ... We now present numerical experiments and Appendix K contains further simulations and details. |
| Researcher Affiliation | Academia | Tanner Fiez & Lillian J. Ratliff Department of Electrical and Computer Engineering University Washington Seattle, WA 98195 {fiezt, ratliffl}@uw.edu |
| Pseudocode | No | The paper describes mathematical formulations and algorithms (e.g., 'The τ-GDA dynamics... are given by xk+1 = xk γ1Λτg(xk).') but does not include any structured pseudocode blocks or algorithm listings. |
| Open Source Code | Yes | Code for the experiments is included in the supplemental material. |
| Open Datasets | Yes | Finally, we extend the results to gradient penalty regularization methods for generative adversarial networks and empirically demonstrate on CIFAR-10 and Celeb A the significant impact timescale separation has on training performance. (Abstract) |
| Dataset Splits | No | The paper uses standard datasets like CIFAR-10 and Celeb A for experiments and mentions 'training process' and 'evaluation' but does not specify how the datasets were formally split into training, validation, and test sets. It does not provide explicit percentages, sample counts, or references to predefined splits for these purposes. |
| Hardware Specification | No | The paper mentions that 'The experiments are computationally intensive' but does not provide any specific details regarding the hardware used, such as GPU models, CPU specifications, or memory. |
| Software Dependencies | No | The paper mentions using RMSprop ('We run RMSprop with parameter α = 0.99') and a PyTorch-based FID implementation ('We used the FID score implementation in pytorch available at https://github.com/mseitzer/pytorch-fid'). However, it does not specify version numbers for PyTorch, RMSprop, or any other software dependencies, which are necessary for full reproducibility. |
| Experiment Setup | Yes | We fix the initial learning rate for the generator to be γ1 = 0.0001 with CIFAR-10 and γ1 = 0.00005 for Celeb A. The learning rates are decayed so that γ1,k = γ1/(1 + ν)k and γ2,k = τγ1,k are the generator and discriminator learning rates at update k where ν = 0.005. The batch size is 64, the latent data is drawn from a standard normal of dimension 256, and the resolution of the images is 32 32 3. We run RMSprop with parameter α = 0.99 and retain an exponential moving average of the generator parameters for evaluation with parameter β = 0.9999. (Section 5) ... In Figure 18 we include the hyperparameters that were selected. (Appendix K.7.2) |