Implicit Regularization of Decentralized Gradient Descent for Sparse Regression
Authors: Tongle Wu, Ying Sun
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical results are provided to validate the effectiveness of DGD and T-DGD for sparse learning through implicit regularization.This section conducts the experimental studies to evaluate the theoretical findings of DGD and T-DGD for solving problem (2) in Subsection 6.1, Subsection 6.2, respectively. |
| Researcher Affiliation | Academia | Tongle Wu The Pennsylvania State University tfw5381@psu.edu Ying Sun The Pennsylvania State University ybs5190@psu.edu |
| Pseudocode | No | The paper describes algorithms through mathematical equations and textual explanations but does not include explicitly labeled "Pseudocode" or "Algorithm" blocks. |
| Open Source Code | No | The paper does not contain an explicit statement about the release of source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We use vanilla decentralized SGD(DSGD) to train a depth-2, 5000 hidden Re LU network with the cross-entropy loss on the MNIST dataset...on CIFAR10. |
| Dataset Splits | Yes | 60000 total training samples and 10000 test samples are uniformly allocated to agents.The step sizes were optimally tuned for each α individually to achieve the best validation error. |
| Hardware Specification | Yes | All experiments are conducted on 12th Gen Intel(R) Core(TM) i7-12700@2.10GHz processor and 16.0GB RAM under Windows 11 system. |
| Software Dependencies | No | The paper does not specify version numbers for any software dependencies or libraries used in the experiments (e.g., specific Python or PyTorch versions). |
| Experiment Setup | Yes | We set d = 2000, s = 10, m = 10, N = 400, ρ = 0.1778, α = 10 6.We select the maximum initialization α that achieves optimal statistical error, resulting in α = 10 8 for d = 4 102, α = 10 8.5 for d = 4 103 and α = 10 9 for d = 4 104.Each agent uses the same batch size 256 to train in DSGD.small step size 10 4 for 2000 epochs. |