Jointly Improving the Sample and Communication Complexities in Decentralized Stochastic Minimax Optimization
Authors: Xuan Zhang, Gabriel Mancino-Ball, Necdet Serhat Aybat, Yangyang Xu
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical Experiments We test our proposed method on three problems: a quadratic minimax problem, robust non-convex linear regression, and robust neural network training. For the first and third problem, we let M = 8 such that each agent is represented by an NVIDIA Tesla V100 GPU. For the second problem, we test methods in a serial manner to facilitate more general reproducibility; here, we let M = 20. In all cases, we use a ring (cycle) graph with equal weights on edges including self loops, i.e., wi,i 1 = wi,i = wi,i+1 = 1/3 for all i [M]. The learning rates for all tests are chosen such that ηy {10 1, 10 2, 10 3} and we tune the ratio ηx ηy {1, 10 1, 10 2, 10 3}. We test our proposed method against 3 methods: DPSOG (Liu et al. 2020), DM-HSGD (Xian et al. 2021), and the deterministic GT/DA (Tsaknakis, Hong, and Liu 2020). |
| Researcher Affiliation | Academia | Department of Industrial and Manufacturing Engineering, The Pennsylvania State University, University Park, PA 2Department of Mathematical Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180 |
| Pseudocode | Yes | Algorithm 1: DGDA-VR |
| Open Source Code | Yes | The code is made available at https://github.com/gmancino/DGDA-VR. |
| Open Datasets | Yes | Inspired by (Deng and Mahdavi 2021), we adopt gxi corresponding to a two-layer network (200 hidden units) with a tanh activation function, and we use the MNIST (Le Cun 1998) dataset for training. |
| Dataset Splits | No | The paper mentions using specific datasets (MNIST, a9a, ijcnn1) for training and testing but does not provide explicit training/validation/test splits, percentages, or sample counts, nor does it refer to standard predefined splits for these datasets. |
| Hardware Specification | Yes | For the first and third problem, we let M = 8 such that each agent is represented by an NVIDIA Tesla V100 GPU. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers for its implementation, such as Python, PyTorch, or TensorFlow versions. |
| Experiment Setup | Yes | The learning rates for all tests are chosen such that ηy {10 1, 10 2, 10 3} and we tune the ratio ηx ηy {1, 10 1, 10 2, 10 3}. We test our proposed method against 3 methods: DPSOG (Liu et al. 2020), DM-HSGD (Xian et al. 2021), and the deterministic GT/DA (Tsaknakis, Hong, and Liu 2020)... For our proposed method, we set q = S1 = 100... We fix the mini-batch to be 32 for all methods beside GT/DA and set S1 = 1,000, q = 32 for our method... We fix the mini-batch size for all methods to be 100 (besides GT/DA). For DGDA-VR, we set q = 100 and S1 = 7, 500. |