reproducibilityindex.ai

Why Random Pruning Is All We Need to Start Sparse

Authors: Advait Harshal Gadhikar, Sohom Mukherjee, Rebekka Burkholz

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We offer a theoretical explanation of how random masks can approximate arbitrary target networks if they are wider by a logarithmic factor in the inverse sparsity 1/ log(1/sparsity)... We demonstrate the feasibility of this approach in experiments for different pruning methods and propose particularly effective choices of initial layer-wise sparsity ratios of the random source network.
Researcher Affiliation	Academia	1CISPA Helmholtz Center for Information Security, Saarbr ucken, Germany. Correspondence to: Advait Gadhikar <advait.gadhikar@cispa.de>.
Pseudocode	No	The paper describes algorithms and procedures in prose and mathematical notation but does not include any clearly labeled "Pseudocode" or "Algorithm" blocks.
Open Source Code	Yes	Our code is available at https://github.com/ Relational ML/sparse_to_sparse.
Open Datasets	Yes	We conduct our experiments with two datasets built for image classification tasks: CIFAR10 and CIFAR100 (Krizhevsky et al., 2009) and Tiny Image Net (Russakovsky et al., 2015b).
Dataset Splits	Yes	We use the validation set provided by the creators of Tiny Imagenet (Russakovsky et al., 2015b) as a test set to measure the generalization performance of our trained models.
Hardware Specification	Yes	All our experiments were run with 4 Nvidia A100 GPUs.
Software Dependencies	No	The paper mentions using Stochastic Gradient Descent (SGD) and Adam optimizer, and states that its code builds on work by Liu et al. (2021), Tanaka et al. (2020), and Kusupati et al. (2020). However, it does not provide specific version numbers for any software libraries or dependencies.
Experiment Setup	Yes	Each model is trained using Stochastic Gradient Descent (SGD) with learning rate 0.1 and momentum 0.9 with weight decay 0.0005 and batch size 128. We use the same hyperparameters as (Ma et al., 2021) and train every model for 160 epochs.