reproducibilityindex.ai

LEARNING TO SHARE: SIMULTANEOUS PARAMETER TYING AND SPARSIFICATION IN DEEP LEARNING

Authors: Dejiao Zhang, Haozhu Wang, Mario Figueiredo, Laura Balzano

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate this approach on several benchmark datasets, showing that it can dramatically compress the network with slight or even no loss on generalization accuracy.
Researcher Affiliation	Academia	Dejiao Zhang University of Michigan, Ann Arbor, USA dejiao@umich.edu Haozhu Wang University of Michigan, Ann Arbor, USA hzwang@umich.edu M ario A.T. Figueiredo Instituto de Telecomunicac oe and Instituto Superior T ecnico University of Lisbon, Portugal mario.figueiredo@lx.it.pt Laura Balzano University of Michigan, Ann Arbor, USA girasole@umich.edu
Pseudocode	Yes	The training method is summarized in Algorithm 1. Algorithm 2 Prox Gr OWL Bogdan et al. (2015) for solving proxη,Ωλ(z) Algorithm 3 Afﬁnity Propagation Frey & Dueck (2007)
Open Source Code	No	The paper does not provide a direct link to the source code for the methodology described, nor does it explicitly state that the code is being released or available in supplementary materials.
Open Datasets	Yes	We assess the performance of the proposed method on two benchmark datasets: MNIST and CIFAR-10. The MNIST dataset contains centered images of handwritten digits (0 9), of size 28 28 (784) pixels.
Dataset Splits	No	The paper mentions '10000 training and 1000 testing examples' for synthetic data, and uses MNIST and CIFAR-10, but it does not specify explicit train/validation/test splits, percentages, or validation set sizes for any of the datasets used in the experiments.
Hardware Specification	No	The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	We implement all models using Tensorﬂow Abadi et al. (2016). In this paper, we use the built-in afﬁnity propagation method of the scikit-learn package (Buitinck et al., 2013). The paper mentions software like TensorFlow and scikit-learn but does not provide specific version numbers for these dependencies.
Experiment Setup	Yes	For the MNIST experiment: The network is trained for 300 epochs and then retrained for an additional 100 epochs, both with momentum. The initial learning rate is set to 0.001, for both training and retraining, and is reduced by a factor of 0.96 every 10 epochs. We set p = 0.5, and Λ1, Λ2 are selected by grid search. For the CIFAR-10 experiment: We ﬁrst train the network under different regularizers for 150 epochs, then retrain it for another 50 epochs, using the learning rate decay scheme described by He et al. (2016): the initial rates for the training and retraining phases are set to 0.01 and 0.001, respectively; the learning rate is multiplied by 0.1 every 60 epochs of the training phase, and every 20 epochs of the retraining phase. For Gr OWL (+ℓ2), we set p = 0.1 n (see Eq. (9)) for all layers, where n denotes the number of rows of the (reshaped) weight matrices of each layer.