Faster Stochastic Variance Reduction Methods for Compositional MiniMax Optimization

Authors: Jin Liu, Xiaokang Pan, Junwen Duan, Hong-Dong Li, Youqi Li, Zhe Qu

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments support the efficiency of our proposed methods. and Extensive experimental results support the effectiveness of our proposed methods. Also, the "Experiments" section describes datasets, performance evaluation, figures (Figure 1, 2, 3, 4, 5, 6), and Table 1.
Researcher Affiliation Academia 1School of Computer Science and Engineering, Central South University, Changsha, China. 2School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China.
Pseudocode Yes Algorithm 1: Illustration of NSTORM method. and Algorithm 2: Illustration of ADA-NSTORM method.
Open Source Code No The paper does not provide any explicit statements about open-source code availability or links to a repository.
Open Datasets Yes We employ four distinct image classification datasets in our study: CAT vs DOG, CIFAR10, CIFAR100 (Krizhevsky 2009), and STL10 (Coates, Ng, and Lee 2011).
Dataset Splits No We employ four distinct image classification datasets in our study: CAT vs DOG, CIFAR10, CIFAR100 (Krizhevsky 2009), and STL10 (Coates, Ng, and Lee 2011). While standard datasets are used, the paper does not explicitly provide the specific training/validation/test dataset splits (percentages or counts) used for the experiments. It mentions following a methodology for creating imbalanced variants but not the exact split proportions.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, or memory specifications used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers).
Experiment Setup Yes Weight decay was consistently set to 1e-4. Each method was trained with batch size 128, spanning 100 epochs. We varied parameter m (50, 500, 5000) and set γ (1, 0.9, 0.5). Learning rate ηt reduced by 10 at 50% and 75% training. Also, β is set to 0.9. For robustness, each experiment was conducted thrice with distinct seeds, computing mean and standard deviations.