reproducibilityindex.ai

ProxSGD: Training Structured Neural Networks under Regularization and Constraints

Authors: Yang Yang, Yaxiong Yuan, Avraam Chatzimichailidis, Ruud JG van Sloun, Lei Lei, Symeon Chatzinotas

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, to support the theoretical analysis and demonstrate the ﬂexibility of Prox SGD, we show by extensive numerical tests how Prox SGD can be used to train either sparse or binary neural networks through an adequate selection of the regularization function and constraint set.
Researcher Affiliation	Collaboration	Yang Yang Fraunhofer ITWM Fraunhofer Center Machine Learning yang.yang@itwm.fraunhofer.de Yaxiong Yuan University of Luxembourg yaxiong.yuan@uni.lu Avraam Chatzimichailidis Fraunhofer ITWM TU Kaiserslautern avraam.chatzimichailidis@itwm.fraunhofer.de Ruud JG van Sloun Eindhoven University of Technology r.j.g.v.sloun@tue.nl Lei Lei, Symeon Chatzinotas University of Luxembourg {lei.lei, symeon.chatzinotas}@uni.lu
Pseudocode	Yes	Algorithm 1 Proximal-type Stochastic Gradient Descent (Prox SGD) Method
Open Source Code	Yes	The simulations in Setion 3.1 and 3.3 are implemented in Tensor Flow and available at https://github.com/optyang/proxsgd. The simulations in Section 3.2 are implemented in Py Torch and available at https://github.com/cc-hpc-itwm/proxsgd.
Open Datasets	Yes	We ﬁrst consider the multiclass classiﬁcation problem on CIFAR-10 dataset (Krizhevsky, 2009)
Dataset Splits	No	No explicit description of dataset splits (e.g., percentages, sample counts, or methodology for splitting data into train, validation, and test sets) was found. The paper mentions using well-known datasets like CIFAR-10, CIFAR-100, and MNIST, which have standard splits, but these specific splits are not detailed or cited within the text.
Hardware Specification	No	No specific hardware details (such as GPU models, CPU types, or memory) used for running the experiments were provided. The paper only mentions that simulations were implemented in TensorFlow and PyTorch.
Software Dependencies	No	The paper mentions that simulations are implemented in "TensorFlow" and "PyTorch" but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	Following the parameter conﬁgurations of ADAM in Kingma & Ba (2015), AMSGrad in Reddi et al. (2018), and ADABound in Luo et al. (2019), we set ρ = 0.1, β = 0.999 and ϵ = 0.001 (see Table 1), which are uniform for all the algorithms and commonly used in practice. Note that we have also activated ℓ1-regularization for these algorithms in the built-in function in Tensor Flow/Py Torch, which amounts to adding the subgradient of the ℓ1-norm to the gradient of the loss function. For the proposed Prox SGD, ϵ(t) and ρ(t) decrease over the iterations as follows, ϵ(t) = 0.06 (t + 4)0.5 , ρ(t) = 0.9 (t + 4)0.5 . Recall that the ℓ1-norm in the approximation subproblem naturally leads to the soft-thresholding proximal mapping, see (10). The regularization parameter µ in the soft-thresholding then permits controlling the sparsity of the parameter variable x; in this experiment we set µ = 5 10 5.