Deep Rewiring: Training very sparse deep networks

Authors: Guillaume Bellec, David Kappel, Wolfgang Maass, Robert Legenstein

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that DEEP R can be used to train very sparse feedforward and recurrent neural networks on standard benchmark tasks with just a minor loss in performance. We show on several benchmark tasks that with DEEP R, the connectivity of several deep architectures fully connected deep networks, convolutional nets, and recurrent networks (LSTMs) can be constrained to be extremely sparse throughout training with a marginal drop in performance.
Researcher Affiliation Academia Guillaume Bellec, David Kappel, Wolfgang Maass & Robert Legenstein Institute for Theoretical Computer Science Graz University of Technology Austria {bellec,kappel,maass,legenstein}@igi.tugraz.at
Pseudocode Yes Algorithm 1: Pseudo code of the DEEP R algorithm. Algorithm 2: Pseudo code of the soft-DEEP R algorithm.
Open Source Code Yes Implementations of DEEP R are freely available at github.com/guillaume Bellec/deep rewiring.
Open Datasets Yes For MNIST, we considered a fully connected feed-forward network used in Han et al. (2015b) to benchmark pruning algorithms. On the CIFAR-10 dataset, we used a convolutional neural network (CNN) with two convolutional followed by two fully connected layers. As a test bed, we considered an LSTM network trained on the TIMIT data set.
Dataset Splits Yes A validation set and early stopping were necessary to train a network with dense connectivity matrix on TIMIT because the performance was sometimes unstable and it suddenly dropped during training as seen in Fig. 3D for ℓ1-shrinkage. Therefore a validation set was defined by randomly selecting 5% of the training utterances.
Hardware Specification No The paper discusses hardware limitations in general (e.g., "Google's tensor processing units (TPUs)", "neuromorphic hardware"), but it does not specify any particular hardware (e.g., CPU, GPU models, cloud instances) used for running the experiments described in the paper.
Software Dependencies Yes For reproducibility purposes the network architecture and all parameters of this CNN were taken from the official tutorial of TensorFlow. Tensor Flow version 1.3: www.tensorflow.org/tutorials/deep cnn. To accelerate the training in comparison the reference from Greff et al. (2017) we used mini-batches of size 32 and the ADAM optimizer (Kingma & Ba (2014)).
Experiment Setup Yes For MNIST, we considered a fully connected feed-forward network... For all algorithms we used a learning rate of 0.05 and a batch size of 10 with standard stochastic gradient descent. Learning stopped after 10 epochs. In MNIST, 96.3% accuracy under the constraint of 1% connectivity was achieved with α = 10 4 and T chosen so that T = η 210 12. In TIMIT, α = 0.03 and T = 0.