Stability and Generalization of Asynchronous SGD: Sharper Bounds Beyond Lipschitz and Smoothness
Authors: Xiaoge Deng, Tao Sun, Shengwei Li, Dongsheng Li, Xicheng Lu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we validate our theoretical findings by training numerous machine learning models, including convex problems and non-convex tasks in computer vision and natural language processing. |
| Researcher Affiliation | Academia | Xiaoge Deng Tao Sun Shengwei Li Dongsheng Li Xicheng Lu College of Computer Science and Technology National University of Defense Technology, China dengxg@nudt.edu.cn, suntao.saltfish@outlook.com, lucasleesw9@gmail.com dsli@nudt.edu.cn, xclu@nudt.edu.cn |
| Pseudocode | Yes | The ASGD procedure is described in Algorithm 1 (located in Appendix A.1). |
| Open Source Code | Yes | We have submitted the source code in the Supplementary Material and provided sufficient instructions for usage in the README.md file. |
| Open Datasets | Yes | For the convex optimization problem, we employed a single-layer linear network with the mean squared error for a classification task on the RCV1 data set from the LIBSVM database [10]. |
| Dataset Splits | No | The paper mentions training data and test datasets but does not explicitly provide information on training/validation/test splits, such as percentages or sample counts for each split. |
| Hardware Specification | Yes | All of our experiments were implemented with PyTorch on Nvidia RTX-3090 24 GB GPUs. |
| Software Dependencies | No | The paper mentions 'PyTorch' as the implementation framework but does not specify its version number or other software dependencies with version numbers. |
| Experiment Setup | Yes | Following our theoretical findings, we set the learning rate to 0.1/τ for different delays, where τ denotes the average delay. |