Stable Nonconvex-Nonconcave Training via Linear Interpolation
Authors: Thomas Pethick, Wanyun Xie, Volkan Cevher
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper presents a theoretical analysis of linear interpolation as a principled method for stabilizing (large-scale) neural network training. We corroborate the results with experiments on generative adversarial networks which demonstrates the benefits of the linear interpolation present in both RAPP and Lookahead. |
| Researcher Affiliation | Academia | Thomas Pethick EPFL (LIONS) thomas.pethick@epfl.ch Wanyun Xie EPFL (LIONS) wanyun.xie@epfl.ch Volkan Cevher EPFL (LIONS) volkan.cevher@epfl.ch |
| Pseudocode | Yes | Algorithm 1 Relaxed approximate proximal point method (RAPP) |
| Open Source Code | No | The paper does not provide any concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described. |
| Open Datasets | Yes | We demonstrate the methods on the CIFAR10 dataset (Krizhevsky et al., 2009). |
| Dataset Splits | No | The paper mentions tuning learning rates and update ratios, but does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for training, validation, or test sets. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment. |
| Experiment Setup | Yes | The learning rates are tuned for GDA and we use those parameters fixed across all other methods. The first experiment we conduct matches the setting of Chavdarova et al. (2020) by relying on the Adam optimizer and using and update ratio of 5 : 1 between the discriminator and generator. We additionally simplify the setup by using GDA-based optimizers with an update ratio of 1 : 1. |