Training Structured Neural Networks Through Manifold Identification and Variance Reduction
Authors: Zih-Syuan Huang, Ching-pei Lee
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on training NNs with structured sparsity confirm that variance reduction is necessary for such an identification, and show that RMDA thus significantly outperforms existing methods for this task. For unstructured sparsity, RMDA also outperforms a state-of-the-art pruning method, validating the benefits of training structured NNs through regularization. |
| Researcher Affiliation | Academia | Zih-Syuan Huang Academia Sinica zihsyuan@stat.sinica.edu.tw Ching-pei Lee Academia Sinica leechingpei@gmail.com |
| Pseudocode | Yes | Details of the proposed RMDA are in Algorithm 1. At the t-th iteration with the iterate W t 1, we draw an independent sample ξt D to compute the stochastic gradient fξt(W t 1), decide a learning rate ηt, and update the weighted sum Vt of previous stochastic gradients using ηt and the scaling factor βt := V0 := 0, Vt := Xt k=1 ηkβk fξk(W k 1) = Vt 1 + ηtβt fξt(W t 1), t > 0. Algorithm 1: RMDA (W 0, T, η( ), c( )) |
| Open Source Code | Yes | Implementation of RMDA is available at https://www.github.com/zihsyuan1214/rmda. |
| Open Datasets | Yes | The two simpler models are linear logistic regression with the MNIST dataset (Le Cun et al., 1998), and training a small NN with seven fully-connected layers on the Fashion MNIST dataset (Xiao et al., 2017). ...A modified VGG19 (Simonyan & Zisserman, 2015) with the CIFAR10 dataset (Krizhevsky, 2009), 4. The same modified VGG19 with the CIFAR100 dataset (Krizhevsky, 2009), 5. Res Net50 (He et al., 2016) with the CIFAR10 dataset, and 6. Res Net50 with the CIFAR100 dataset. |
| Dataset Splits | Yes | To compare these algorithms, we examine both the validation accuracy and the group sparsity level of their trained models. ... All results shown in tables in Sections 5.1 and 5.2 are the mean and standard deviation of three independent runs with the same hyperparameters, while figures use one representative run for better visualization. ... Table 1: Group sparsity and validation accuracy of different methods. |
| Hardware Specification | Yes | We also report that in the Res Net50/CIFAR100 task, on our NVIDIA RTX 8000 GPU, MSGD, Prox SGD, and RMDA have similar per-epoch cost of 68, 77, and 91 seconds respectively, while Prox SSI needs 674 seconds per epoch. |
| Software Dependencies | No | RMDA and the following methods for structured sparsity in deep learning are compared using Py Torch (Paszke et al., 2019). ... For Rig L, we use the Py Torch implementation of Sundar & Dwaraknath (2021). |
| Experiment Setup | Yes | Throughout the experiments, we always use multi-step learning rate scheduling that decays the learning rate by a constant factor every time the epoch count reaches a pre-specified threshold. For all methods, we conduct grid searches to find the best hyperparameters. All results shown in tables in Sections 5.1 and 5.2 are the mean and standard deviation of three independent runs with the same hyperparameters, while figures use one representative run for better visualization. ...Tables 3 to 13 provide detailed settings of Section 5.2. For example, Table 3 states: Learning rate schedule η(epoch) = 10 1 epoch/50, Momentum 10 1, Total epochs 500. |