A Convergent and Dimension-Independent Min-Max Optimization Algorithm

Authors: Vijay Keswani, Oren Mangoubi, Sushant Sachdeva, Nisheeth K. Vishnoi

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically evaluate our algorithm on challenging nonconvex-nonconcave test-functions and loss functions arising in GAN training. Our algorithm converges on these test functions and, when used to train GANs, trains stably on synthetic and real-world datasets and avoids mode collapse.
Researcher Affiliation Academia 1Department of Statistics and Data Science, Yale University, US 2Department of Mathematical Sciences, Worcester Polytechnic Institute, US 3Department of Computer Science, University of Toronto, Canada 4Department of Computer Science, Yale University, US.
Pseudocode Yes Algorithm 1 Our algorithm for min-max optimization
Open Source Code No Our code for the CIFAR-10 simulations is based on the code of Jason Brownlee (Brownlee, 2019), which originally used gradient descent ascent and ADAM gradients for training. Our code for the MNIST simulations is based on the code of Renu Khandelwal (Khandelwal, 2019) and Rowel Atienza (Atienza, 2017), which originally used gradient descent ascent and ADAM gradients for training.
Open Datasets Yes Gaussian mixture dataset. This synthetic dataset consists of 512 points sampled from a mixture of four equally weighted Gaussians in two dimensions with standard deviation 0.01 and means at (0, 1), (1, 0), ( 1, 0), (0, 1). ... 01-MNIST and CIFAR-10. For the 01-MNIST dataset... This dataset consists of 60k images of hand-written digits (Le Cun et al., 2010).
Dataset Splits No The paper mentions 'm training examples' and refers to 'training GANs' on datasets like Gaussian mixture, MNIST, and CIFAR-10, but does not explicitly provide the train/validation/test dataset splits (e.g., percentages, sample counts, or specific predefined splits with citations) used for its experiments.
Hardware Specification Yes The experiments were performed on four 3.0 GHz Intel Scalable CPU Processors, provided by AWS. Our simulations on the CIFAR-10 dataset were performed on the above, and using one GPU with High frequency Intel Xeon E5-2686 v4 (Broadwell) processors, provided by AWS.
Software Dependencies No The paper mentions using 'Adam learning rates' and refers to code based on 'Keras on Tensorflow backend', but it does not provide specific version numbers for any software dependencies like libraries, frameworks, or programming languages.
Experiment Setup Yes For the simulations on Gaussian mixture data, we have used the code provided by the authors of (Metz et al., 2017) (github.com/poolio/unrolled_gan), which uses a batch size 512, Adam learning rates of 10-3 for the generator and 10-4 for the discriminator, and Adam parameter β1 = 0.5 for both the generator and discriminator. ... For the CIFAR-10 Simulations, we use a batch size of 128, with Adam learning rate of 0.0002 and hyperparameter β1 = 0.5 for both the generator and discriminator gradients.