Nest Your Adaptive Algorithm for Parameter-Agnostic Nonconvex Minimax Optimization
Authors: Junchi YANG, Xiang Li, Niao He
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerically, we further illustrate the robustness of the Ne Ada family with experiments on simple test functions and a real-world application. |
| Researcher Affiliation | Academia | Junchi Yang Department of Computer Science ETH Zurich, Switzerland junchi.yang@inf.ethz.ch Xiang Li Department of Computer Science ETH Zurich, Switzerland xiang.li@inf.ethz.ch Niao He Department of Computer Science ETH Zurich, Switzerland niao.he@inf.ethz.ch |
| Pseudocode | Yes | Algorithm 1 Non-nested Adaptive Method, Algorithm 2 Nested Adaptive (Ne Ada) Method, Algorithm 3 Ne Ada-Ada Grad, Algorithm 4 Generalized Ada Grad for Strongly-convex Online Learning |
| Open Source Code | Yes | We include the code in supplemental materials. |
| Open Datasets | Yes | Results on Synthetic Dataset. We use the same data generation process as in [71]. ... Results on MNIST Dataset. For MNIST, we use a convolutional neural network... [43] |
| Dataset Splits | No | The paper mentions 10000 training and 4000 test data points for the synthetic dataset, but does not explicitly provide validation split information for either dataset in the main text. |
| Hardware Specification | No | Our experiments do not require large resource of computation. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | When the learning rates are set to different scales, i.e., x = 0.01, y = 0.08 (red curves in the figure),... If we change the learning rates to the same scale, i.e., x = 0.01, y = 0.01 (blue curves in the figure),... For Ne Ada, we use both stopping criterion I with stochastic gradient and criterion II in our experiments. For the results, we report the training loss and the test accuracy on adversarial samples generated from fast gradient sign method (FGSM) [26]. FGSM can be formulated as xadv = x + sign (rxf(x)) , where is the noise level. To get reasonable test accuracy, Ne Ada with Adam as subroutine is compared with Adam with fixed 15 inner loop iterations, which is consistent with the choice of inner loop steps in [71], and such choice obtains much better test accuracy than the completely non-nested Adam. |