Faster Stochastic Variance Reduction Methods for Compositional MiniMax Optimization
Authors: Jin Liu, Xiaokang Pan, Junwen Duan, Hong-Dong Li, Youqi Li, Zhe Qu
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments support the efficiency of our proposed methods. and Extensive experimental results support the effectiveness of our proposed methods. Also, the "Experiments" section describes datasets, performance evaluation, figures (Figure 1, 2, 3, 4, 5, 6), and Table 1. |
| Researcher Affiliation | Academia | 1School of Computer Science and Engineering, Central South University, Changsha, China. 2School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China. |
| Pseudocode | Yes | Algorithm 1: Illustration of NSTORM method. and Algorithm 2: Illustration of ADA-NSTORM method. |
| Open Source Code | No | The paper does not provide any explicit statements about open-source code availability or links to a repository. |
| Open Datasets | Yes | We employ four distinct image classification datasets in our study: CAT vs DOG, CIFAR10, CIFAR100 (Krizhevsky 2009), and STL10 (Coates, Ng, and Lee 2011). |
| Dataset Splits | No | We employ four distinct image classification datasets in our study: CAT vs DOG, CIFAR10, CIFAR100 (Krizhevsky 2009), and STL10 (Coates, Ng, and Lee 2011). While standard datasets are used, the paper does not explicitly provide the specific training/validation/test dataset splits (percentages or counts) used for the experiments. It mentions following a methodology for creating imbalanced variants but not the exact split proportions. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, or memory specifications used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers). |
| Experiment Setup | Yes | Weight decay was consistently set to 1e-4. Each method was trained with batch size 128, spanning 100 epochs. We varied parameter m (50, 500, 5000) and set γ (1, 0.9, 0.5). Learning rate ηt reduced by 10 at 50% and 75% training. Also, β is set to 0.9. For robustness, each experiment was conducted thrice with distinct seeds, computing mean and standard deviations. |