Nest Your Adaptive Algorithm for Parameter-Agnostic Nonconvex Minimax Optimization

Authors: Junchi YANG, Xiang Li, Niao He

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerically, we further illustrate the robustness of the Ne Ada family with experiments on simple test functions and a real-world application.
Researcher Affiliation Academia Junchi Yang Department of Computer Science ETH Zurich, Switzerland junchi.yang@inf.ethz.ch Xiang Li Department of Computer Science ETH Zurich, Switzerland xiang.li@inf.ethz.ch Niao He Department of Computer Science ETH Zurich, Switzerland niao.he@inf.ethz.ch
Pseudocode Yes Algorithm 1 Non-nested Adaptive Method, Algorithm 2 Nested Adaptive (Ne Ada) Method, Algorithm 3 Ne Ada-Ada Grad, Algorithm 4 Generalized Ada Grad for Strongly-convex Online Learning
Open Source Code Yes We include the code in supplemental materials.
Open Datasets Yes Results on Synthetic Dataset. We use the same data generation process as in [71]. ... Results on MNIST Dataset. For MNIST, we use a convolutional neural network... [43]
Dataset Splits No The paper mentions 10000 training and 4000 test data points for the synthetic dataset, but does not explicitly provide validation split information for either dataset in the main text.
Hardware Specification No Our experiments do not require large resource of computation.
Software Dependencies No The paper does not specify any software dependencies with version numbers.
Experiment Setup Yes When the learning rates are set to different scales, i.e., x = 0.01, y = 0.08 (red curves in the figure),... If we change the learning rates to the same scale, i.e., x = 0.01, y = 0.01 (blue curves in the figure),... For Ne Ada, we use both stopping criterion I with stochastic gradient and criterion II in our experiments. For the results, we report the training loss and the test accuracy on adversarial samples generated from fast gradient sign method (FGSM) [26]. FGSM can be formulated as xadv = x + sign (rxf(x)) , where is the noise level. To get reasonable test accuracy, Ne Ada with Adam as subroutine is compared with Adam with fixed 15 inner loop iterations, which is consistent with the choice of inner loop steps in [71], and such choice obtains much better test accuracy than the completely non-nested Adam.