Implicit Regularization of Decentralized Gradient Descent for Sparse Regression

Authors: Tongle Wu, Ying Sun

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical results are provided to validate the effectiveness of DGD and T-DGD for sparse learning through implicit regularization.This section conducts the experimental studies to evaluate the theoretical findings of DGD and T-DGD for solving problem (2) in Subsection 6.1, Subsection 6.2, respectively.
Researcher Affiliation Academia Tongle Wu The Pennsylvania State University tfw5381@psu.edu Ying Sun The Pennsylvania State University ybs5190@psu.edu
Pseudocode No The paper describes algorithms through mathematical equations and textual explanations but does not include explicitly labeled "Pseudocode" or "Algorithm" blocks.
Open Source Code No The paper does not contain an explicit statement about the release of source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes We use vanilla decentralized SGD(DSGD) to train a depth-2, 5000 hidden Re LU network with the cross-entropy loss on the MNIST dataset...on CIFAR10.
Dataset Splits Yes 60000 total training samples and 10000 test samples are uniformly allocated to agents.The step sizes were optimally tuned for each α individually to achieve the best validation error.
Hardware Specification Yes All experiments are conducted on 12th Gen Intel(R) Core(TM) i7-12700@2.10GHz processor and 16.0GB RAM under Windows 11 system.
Software Dependencies No The paper does not specify version numbers for any software dependencies or libraries used in the experiments (e.g., specific Python or PyTorch versions).
Experiment Setup Yes We set d = 2000, s = 10, m = 10, N = 400, ρ = 0.1778, α = 10 6.We select the maximum initialization α that achieves optimal statistical error, resulting in α = 10 8 for d = 4 102, α = 10 8.5 for d = 4 103 and α = 10 9 for d = 4 104.Each agent uses the same batch size 256 to train in DSGD.small step size 10 4 for 2000 epochs.