Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
DEEP NEURAL NETWORK INITIALIZATION WITH SPARSITY INDUCING ACTIVATIONS
Authors: Ilan Price, Nicholas Daultry Ball, Adam Christopher Jones, Samuel Chun Hei Lam, Jared Tanner
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments verify the theory and show that the proposed magnitude clipped sparsifying activations can be trained with training and test fractional sparsity as high as 85% while retaining close to full accuracy. |
| Researcher Affiliation | Academia | Ilan Price*, , Nicholas Daultry Ball*, Samuel C.H. Lam*, Adam C. Jones* & Jared Tanner* (*) Mathematical Institute, University of Oxford ( ) The Alan Turing Institute {ilan.price,nicholas.daultryball,samuel.lam,adam.c.jones,tanner} @maths.ox.ac.uk |
| Pseudocode | No | The paper does not include any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not include an explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We train both feedforward networks (abridged as DNNs) of width 300 and depth 100 using CRe LUΟ,m and CSTΟ,m to classify digits from the MNIST dataset and, similarly, CNNs with 300 channels in each layer and depth 50 are trained to classify images from the CIFAR10 dataset. |
| Dataset Splits | Yes | For both MNIST and CIFAR10, 10% of the training set was held out as the validation set. |
| Hardware Specification | Yes | Experiments were run on a single V100 GPU |
| Software Dependencies | No | The paper states 'implemented using Pytorch Lightning' but does not specify version numbers for PyTorch, Lightning, or any other critical software dependencies. |
| Experiment Setup | Yes | The networks are initialized at the Eo C using q = 1, before being trained by stochastic gradient descent (SGD) for 200 epochs with learning rate of 10 4 and 10 3 for the DNN and CNN respectively. |