Stability and Generalization of the Decentralized Stochastic Gradient Descent Ascent Algorithm

Authors: Miaoxi Zhu, Li Shen, Bo Du, Dacheng Tao

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Additionally, we perform several numerical experiments which validate our theoretical findings.
Researcher Affiliation Collaboration Miaoxi Zhu1 Li Shen2* Bo Du1 Dacheng Tao3 1 School of Computer Science, National Engineering Research Center for Multimedia Software, Institute of Artificial Intelligence and Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, China 2 JD Explore Academy, China 3 The University of Sydney, Australia
Pseudocode Yes Algorithm 1 D-SGDA
Open Source Code No Our implementation is highly based on the two source codes23. 2https://github.com/zhenhuan-yang/minimax-stability 3https://github.com/Raiden-Zhu/Generalization-of-DSGD
Open Datasets Yes We evaluate our theoretical results of the C-C case by adopting the SOLAM method [41] to solve the AUC problem on two datasets svmguide and w5a, and the NC-NC case by solving the generative adversarial network on MNIST.
Dataset Splits No The paper mentions "smaller difference between training dataset and validation dataset" but does not specify explicit splits (e.g., percentages or absolute counts) for these sets.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper states its implementation is based on two source codes but does not explicitly list software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes Our experimental setting follows the way conducted in [13, 19] to study how the stability and generalizability of D-SGDA would behave along the learning process with different factors, including learning rates, typologies, nodes, and sample sizes. ... The leaky Re LU is taken before the output layer. ... We take 3 different seeds and 3 different ways to construct S , which means changing different observations (total 9 runs).