A Convergent and Dimension-Independent Min-Max Optimization Algorithm
Authors: Vijay Keswani, Oren Mangoubi, Sushant Sachdeva, Nisheeth K. Vishnoi
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically evaluate our algorithm on challenging nonconvex-nonconcave test-functions and loss functions arising in GAN training. Our algorithm converges on these test functions and, when used to train GANs, trains stably on synthetic and real-world datasets and avoids mode collapse. |
| Researcher Affiliation | Academia | 1Department of Statistics and Data Science, Yale University, US 2Department of Mathematical Sciences, Worcester Polytechnic Institute, US 3Department of Computer Science, University of Toronto, Canada 4Department of Computer Science, Yale University, US. |
| Pseudocode | Yes | Algorithm 1 Our algorithm for min-max optimization |
| Open Source Code | No | Our code for the CIFAR-10 simulations is based on the code of Jason Brownlee (Brownlee, 2019), which originally used gradient descent ascent and ADAM gradients for training. Our code for the MNIST simulations is based on the code of Renu Khandelwal (Khandelwal, 2019) and Rowel Atienza (Atienza, 2017), which originally used gradient descent ascent and ADAM gradients for training. |
| Open Datasets | Yes | Gaussian mixture dataset. This synthetic dataset consists of 512 points sampled from a mixture of four equally weighted Gaussians in two dimensions with standard deviation 0.01 and means at (0, 1), (1, 0), ( 1, 0), (0, 1). ... 01-MNIST and CIFAR-10. For the 01-MNIST dataset... This dataset consists of 60k images of hand-written digits (Le Cun et al., 2010). |
| Dataset Splits | No | The paper mentions 'm training examples' and refers to 'training GANs' on datasets like Gaussian mixture, MNIST, and CIFAR-10, but does not explicitly provide the train/validation/test dataset splits (e.g., percentages, sample counts, or specific predefined splits with citations) used for its experiments. |
| Hardware Specification | Yes | The experiments were performed on four 3.0 GHz Intel Scalable CPU Processors, provided by AWS. Our simulations on the CIFAR-10 dataset were performed on the above, and using one GPU with High frequency Intel Xeon E5-2686 v4 (Broadwell) processors, provided by AWS. |
| Software Dependencies | No | The paper mentions using 'Adam learning rates' and refers to code based on 'Keras on Tensorflow backend', but it does not provide specific version numbers for any software dependencies like libraries, frameworks, or programming languages. |
| Experiment Setup | Yes | For the simulations on Gaussian mixture data, we have used the code provided by the authors of (Metz et al., 2017) (github.com/poolio/unrolled_gan), which uses a batch size 512, Adam learning rates of 10-3 for the generator and 10-4 for the discriminator, and Adam parameter β1 = 0.5 for both the generator and discriminator. ... For the CIFAR-10 Simulations, we use a batch size of 128, with Adam learning rate of 0.0002 and hyperparameter β1 = 0.5 for both the generator and discriminator gradients. |