reproducibilityindex.ai

Training Structured Neural Networks Through Manifold Identification and Variance Reduction

Authors: Zih-Syuan Huang, Ching-pei Lee

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on training NNs with structured sparsity conﬁrm that variance reduction is necessary for such an identiﬁcation, and show that RMDA thus signiﬁcantly outperforms existing methods for this task. For unstructured sparsity, RMDA also outperforms a state-of-the-art pruning method, validating the beneﬁts of training structured NNs through regularization.
Researcher Affiliation	Academia	Zih-Syuan Huang Academia Sinica zihsyuan@stat.sinica.edu.tw Ching-pei Lee Academia Sinica leechingpei@gmail.com
Pseudocode	Yes	Details of the proposed RMDA are in Algorithm 1. At the t-th iteration with the iterate W t 1, we draw an independent sample ξt D to compute the stochastic gradient fξt(W t 1), decide a learning rate ηt, and update the weighted sum Vt of previous stochastic gradients using ηt and the scaling factor βt := V0 := 0, Vt := Xt k=1 ηkβk fξk(W k 1) = Vt 1 + ηtβt fξt(W t 1), t > 0. Algorithm 1: RMDA (W 0, T, η( ), c( ))
Open Source Code	Yes	Implementation of RMDA is available at https://www.github.com/zihsyuan1214/rmda.
Open Datasets	Yes	The two simpler models are linear logistic regression with the MNIST dataset (Le Cun et al., 1998), and training a small NN with seven fully-connected layers on the Fashion MNIST dataset (Xiao et al., 2017). ...A modiﬁed VGG19 (Simonyan & Zisserman, 2015) with the CIFAR10 dataset (Krizhevsky, 2009), 4. The same modiﬁed VGG19 with the CIFAR100 dataset (Krizhevsky, 2009), 5. Res Net50 (He et al., 2016) with the CIFAR10 dataset, and 6. Res Net50 with the CIFAR100 dataset.
Dataset Splits	Yes	To compare these algorithms, we examine both the validation accuracy and the group sparsity level of their trained models. ... All results shown in tables in Sections 5.1 and 5.2 are the mean and standard deviation of three independent runs with the same hyperparameters, while ﬁgures use one representative run for better visualization. ... Table 1: Group sparsity and validation accuracy of different methods.
Hardware Specification	Yes	We also report that in the Res Net50/CIFAR100 task, on our NVIDIA RTX 8000 GPU, MSGD, Prox SGD, and RMDA have similar per-epoch cost of 68, 77, and 91 seconds respectively, while Prox SSI needs 674 seconds per epoch.
Software Dependencies	No	RMDA and the following methods for structured sparsity in deep learning are compared using Py Torch (Paszke et al., 2019). ... For Rig L, we use the Py Torch implementation of Sundar & Dwaraknath (2021).
Experiment Setup	Yes	Throughout the experiments, we always use multi-step learning rate scheduling that decays the learning rate by a constant factor every time the epoch count reaches a pre-speciﬁed threshold. For all methods, we conduct grid searches to ﬁnd the best hyperparameters. All results shown in tables in Sections 5.1 and 5.2 are the mean and standard deviation of three independent runs with the same hyperparameters, while ﬁgures use one representative run for better visualization. ...Tables 3 to 13 provide detailed settings of Section 5.2. For example, Table 3 states: Learning rate schedule η(epoch) = 10 1 epoch/50, Momentum 10 1, Total epochs 500.