reproducibilityindex.ai

Implicit Regularization of Decentralized Gradient Descent for Sparse Regression

Authors: Tongle Wu, Ying Sun

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical results are provided to validate the effectiveness of DGD and T-DGD for sparse learning through implicit regularization.This section conducts the experimental studies to evaluate the theoretical findings of DGD and T-DGD for solving problem (2) in Subsection 6.1, Subsection 6.2, respectively.
Researcher Affiliation	Academia	Tongle Wu The Pennsylvania State University tfw5381@psu.edu Ying Sun The Pennsylvania State University ybs5190@psu.edu
Pseudocode	No	The paper describes algorithms through mathematical equations and textual explanations but does not include explicitly labeled "Pseudocode" or "Algorithm" blocks.
Open Source Code	No	The paper does not contain an explicit statement about the release of source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets	Yes	We use vanilla decentralized SGD(DSGD) to train a depth-2, 5000 hidden Re LU network with the cross-entropy loss on the MNIST dataset...on CIFAR10.
Dataset Splits	Yes	60000 total training samples and 10000 test samples are uniformly allocated to agents.The step sizes were optimally tuned for each α individually to achieve the best validation error.
Hardware Specification	Yes	All experiments are conducted on 12th Gen Intel(R) Core(TM) i7-12700@2.10GHz processor and 16.0GB RAM under Windows 11 system.
Software Dependencies	No	The paper does not specify version numbers for any software dependencies or libraries used in the experiments (e.g., specific Python or PyTorch versions).
Experiment Setup	Yes	We set d = 2000, s = 10, m = 10, N = 400, ρ = 0.1778, α = 10 6.We select the maximum initialization α that achieves optimal statistical error, resulting in α = 10 8 for d = 4 102, α = 10 8.5 for d = 4 103 and α = 10 9 for d = 4 104.Each agent uses the same batch size 256 to train in DSGD.small step size 10 4 for 2000 epochs.