Deep Rewiring: Training very sparse deep networks
Authors: Guillaume Bellec, David Kappel, Wolfgang Maass, Robert Legenstein
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that DEEP R can be used to train very sparse feedforward and recurrent neural networks on standard benchmark tasks with just a minor loss in performance. We show on several benchmark tasks that with DEEP R, the connectivity of several deep architectures fully connected deep networks, convolutional nets, and recurrent networks (LSTMs) can be constrained to be extremely sparse throughout training with a marginal drop in performance. |
| Researcher Affiliation | Academia | Guillaume Bellec, David Kappel, Wolfgang Maass & Robert Legenstein Institute for Theoretical Computer Science Graz University of Technology Austria {bellec,kappel,maass,legenstein}@igi.tugraz.at |
| Pseudocode | Yes | Algorithm 1: Pseudo code of the DEEP R algorithm. Algorithm 2: Pseudo code of the soft-DEEP R algorithm. |
| Open Source Code | Yes | Implementations of DEEP R are freely available at github.com/guillaume Bellec/deep rewiring. |
| Open Datasets | Yes | For MNIST, we considered a fully connected feed-forward network used in Han et al. (2015b) to benchmark pruning algorithms. On the CIFAR-10 dataset, we used a convolutional neural network (CNN) with two convolutional followed by two fully connected layers. As a test bed, we considered an LSTM network trained on the TIMIT data set. |
| Dataset Splits | Yes | A validation set and early stopping were necessary to train a network with dense connectivity matrix on TIMIT because the performance was sometimes unstable and it suddenly dropped during training as seen in Fig. 3D for ℓ1-shrinkage. Therefore a validation set was defined by randomly selecting 5% of the training utterances. |
| Hardware Specification | No | The paper discusses hardware limitations in general (e.g., "Google's tensor processing units (TPUs)", "neuromorphic hardware"), but it does not specify any particular hardware (e.g., CPU, GPU models, cloud instances) used for running the experiments described in the paper. |
| Software Dependencies | Yes | For reproducibility purposes the network architecture and all parameters of this CNN were taken from the official tutorial of TensorFlow. Tensor Flow version 1.3: www.tensorflow.org/tutorials/deep cnn. To accelerate the training in comparison the reference from Greff et al. (2017) we used mini-batches of size 32 and the ADAM optimizer (Kingma & Ba (2014)). |
| Experiment Setup | Yes | For MNIST, we considered a fully connected feed-forward network... For all algorithms we used a learning rate of 0.05 and a batch size of 10 with standard stochastic gradient descent. Learning stopped after 10 epochs. In MNIST, 96.3% accuracy under the constraint of 1% connectivity was achieved with α = 10 4 and T chosen so that T = η 210 12. In TIMIT, α = 0.03 and T = 0. |