DEEP NEURAL NETWORK INITIALIZATION WITH SPARSITY INDUCING ACTIVATIONS

Authors: Ilan Price, Nicholas Daultry Ball, Adam Christopher Jones, Samuel Chun Hei Lam, Jared Tanner

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments verify the theory and show that the proposed magnitude clipped sparsifying activations can be trained with training and test fractional sparsity as high as 85% while retaining close to full accuracy.
Researcher Affiliation Academia Ilan Price*, , Nicholas Daultry Ball*, Samuel C.H. Lam*, Adam C. Jones* & Jared Tanner* (*) Mathematical Institute, University of Oxford ( ) The Alan Turing Institute {ilan.price,nicholas.daultryball,samuel.lam,adam.c.jones,tanner} @maths.ox.ac.uk
Pseudocode No The paper does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not include an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We train both feedforward networks (abridged as DNNs) of width 300 and depth 100 using CRe LUτ,m and CSTτ,m to classify digits from the MNIST dataset and, similarly, CNNs with 300 channels in each layer and depth 50 are trained to classify images from the CIFAR10 dataset.
Dataset Splits Yes For both MNIST and CIFAR10, 10% of the training set was held out as the validation set.
Hardware Specification Yes Experiments were run on a single V100 GPU
Software Dependencies No The paper states 'implemented using Pytorch Lightning' but does not specify version numbers for PyTorch, Lightning, or any other critical software dependencies.
Experiment Setup Yes The networks are initialized at the Eo C using q = 1, before being trained by stochastic gradient descent (SGD) for 200 epochs with learning rate of 10 4 and 10 3 for the DNN and CNN respectively.