Jointly Improving the Sample and Communication Complexities in Decentralized Stochastic Minimax Optimization

Authors: Xuan Zhang, Gabriel Mancino-Ball, Necdet Serhat Aybat, Yangyang Xu

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical Experiments We test our proposed method on three problems: a quadratic minimax problem, robust non-convex linear regression, and robust neural network training. For the first and third problem, we let M = 8 such that each agent is represented by an NVIDIA Tesla V100 GPU. For the second problem, we test methods in a serial manner to facilitate more general reproducibility; here, we let M = 20. In all cases, we use a ring (cycle) graph with equal weights on edges including self loops, i.e., wi,i 1 = wi,i = wi,i+1 = 1/3 for all i [M]. The learning rates for all tests are chosen such that ηy {10 1, 10 2, 10 3} and we tune the ratio ηx ηy {1, 10 1, 10 2, 10 3}. We test our proposed method against 3 methods: DPSOG (Liu et al. 2020), DM-HSGD (Xian et al. 2021), and the deterministic GT/DA (Tsaknakis, Hong, and Liu 2020).
Researcher Affiliation Academia Department of Industrial and Manufacturing Engineering, The Pennsylvania State University, University Park, PA 2Department of Mathematical Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180
Pseudocode Yes Algorithm 1: DGDA-VR
Open Source Code Yes The code is made available at https://github.com/gmancino/DGDA-VR.
Open Datasets Yes Inspired by (Deng and Mahdavi 2021), we adopt gxi corresponding to a two-layer network (200 hidden units) with a tanh activation function, and we use the MNIST (Le Cun 1998) dataset for training.
Dataset Splits No The paper mentions using specific datasets (MNIST, a9a, ijcnn1) for training and testing but does not provide explicit training/validation/test splits, percentages, or sample counts, nor does it refer to standard predefined splits for these datasets.
Hardware Specification Yes For the first and third problem, we let M = 8 such that each agent is represented by an NVIDIA Tesla V100 GPU.
Software Dependencies No The paper does not provide specific software dependencies with version numbers for its implementation, such as Python, PyTorch, or TensorFlow versions.
Experiment Setup Yes The learning rates for all tests are chosen such that ηy {10 1, 10 2, 10 3} and we tune the ratio ηx ηy {1, 10 1, 10 2, 10 3}. We test our proposed method against 3 methods: DPSOG (Liu et al. 2020), DM-HSGD (Xian et al. 2021), and the deterministic GT/DA (Tsaknakis, Hong, and Liu 2020)... For our proposed method, we set q = S1 = 100... We fix the mini-batch to be 32 for all methods beside GT/DA and set S1 = 1,000, q = 32 for our method... We fix the mini-batch size for all methods to be 100 (besides GT/DA). For DGDA-VR, we set q = 100 and S1 = 7, 500.