Stability and Generalization of the Decentralized Stochastic Gradient Descent Ascent Algorithm
Authors: Miaoxi Zhu, Li Shen, Bo Du, Dacheng Tao
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Additionally, we perform several numerical experiments which validate our theoretical findings. |
| Researcher Affiliation | Collaboration | Miaoxi Zhu1 Li Shen2* Bo Du1 Dacheng Tao3 1 School of Computer Science, National Engineering Research Center for Multimedia Software, Institute of Artificial Intelligence and Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, China 2 JD Explore Academy, China 3 The University of Sydney, Australia |
| Pseudocode | Yes | Algorithm 1 D-SGDA |
| Open Source Code | No | Our implementation is highly based on the two source codes23. 2https://github.com/zhenhuan-yang/minimax-stability 3https://github.com/Raiden-Zhu/Generalization-of-DSGD |
| Open Datasets | Yes | We evaluate our theoretical results of the C-C case by adopting the SOLAM method [41] to solve the AUC problem on two datasets svmguide and w5a, and the NC-NC case by solving the generative adversarial network on MNIST. |
| Dataset Splits | No | The paper mentions "smaller difference between training dataset and validation dataset" but does not specify explicit splits (e.g., percentages or absolute counts) for these sets. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper states its implementation is based on two source codes but does not explicitly list software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | Our experimental setting follows the way conducted in [13, 19] to study how the stability and generalizability of D-SGDA would behave along the learning process with different factors, including learning rates, typologies, nodes, and sample sizes. ... The leaky Re LU is taken before the output layer. ... We take 3 different seeds and 3 different ways to construct S , which means changing different observations (total 9 runs). |