Stability and Generalization of Asynchronous SGD: Sharper Bounds Beyond Lipschitz and Smoothness

Authors: Xiaoge Deng, Tao Sun, Shengwei Li, Dongsheng Li, Xicheng Lu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we validate our theoretical findings by training numerous machine learning models, including convex problems and non-convex tasks in computer vision and natural language processing.
Researcher Affiliation Academia Xiaoge Deng Tao Sun Shengwei Li Dongsheng Li Xicheng Lu College of Computer Science and Technology National University of Defense Technology, China dengxg@nudt.edu.cn, suntao.saltfish@outlook.com, lucasleesw9@gmail.com dsli@nudt.edu.cn, xclu@nudt.edu.cn
Pseudocode Yes The ASGD procedure is described in Algorithm 1 (located in Appendix A.1).
Open Source Code Yes We have submitted the source code in the Supplementary Material and provided sufficient instructions for usage in the README.md file.
Open Datasets Yes For the convex optimization problem, we employed a single-layer linear network with the mean squared error for a classification task on the RCV1 data set from the LIBSVM database [10].
Dataset Splits No The paper mentions training data and test datasets but does not explicitly provide information on training/validation/test splits, such as percentages or sample counts for each split.
Hardware Specification Yes All of our experiments were implemented with PyTorch on Nvidia RTX-3090 24 GB GPUs.
Software Dependencies No The paper mentions 'PyTorch' as the implementation framework but does not specify its version number or other software dependencies with version numbers.
Experiment Setup Yes Following our theoretical findings, we set the learning rate to 0.1/τ for different delays, where τ denotes the average delay.