TiAda: A Time-scale Adaptive Algorithm for Nonconvex Minimax Optimization

Authors: Xiang Li, Junchi YANG, Niao He

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our algorithm is fully parameter-agnostic and can achieve near-optimal complexities simultaneously in deterministic and stochastic settings of nonconvex-strongly-concave minimax problems. The effectiveness of the proposed method is further justified numerically for a number of machine learning applications.
Researcher Affiliation Academia Xiang Li, Junchi Yang, Niao He Department of Computer Science, ETH Zurich, Switzerland {xiang.li,junchi.yang,niao.he}@inf.ethz.ch
Pseudocode Yes Algorithm 1 Ti Ada (Time-scale Adaptive Algorithm)
Open Source Code No The paper states that for some parts, they 'adapt code from Lv (2019)' or 'use the code adapted from Green9 (2018)', but it does not provide an explicit statement or link for their own implementation's source code.
Open Datasets Yes We conduct the experiments on the MNIST dataset (Le Cun, 1998)... Another successful and popular application of minimax optimization is generative adversarial networks... with CIFAR-10 dataset (Krizhevsky et al., 2009) in our experiments.
Dataset Splits No The paper mentions using standard datasets like MNIST and CIFAR-10, which have established training sets, but it does not explicitly specify the training, validation, or test split percentages or sample counts, nor does it explicitly state that standard splits were used for reproduction purposes.
Hardware Specification No The paper mentions training deep neural networks and performing experiments but does not specify any hardware details like GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions adapting code (e.g., from Lv (2019) or Green9 (2018)) and using Adam-like optimizers, but it does not specify any software versions (e.g., PyTorch 1.x, Python 3.x).
Experiment Setup Yes in all the experiments, we merely select α = 0.6 and β = 0.4 without further tuning those two hyper-parameters. All experimental details including the neural network structure and hyper-parameters are described in Appendix A.1. We set the batchsize as 128, and for the Adam-like optimizers, including Adam, Ne Ada Adam and Ti Ada-Adam, we use β1 = 0.9, β2 = 0.999 for the first moment and second moment parameters. we set batchsize as 512, the dimension of latent variable as 50 and the weight of gradient penalty term as 10 4. For the Adam-like optimizers, we set β1 = 0.5, β2 = 0.9.