Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

SPDY: Accurate Pruning with Speedup Guarantees

Authors: Elias Frantar, Dan Alistarh

ICML 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments across popular vision and language models show that SPDY guarantees speedups while recovering higher accuracy relative to existing strategies, both for one-shot and gradual pruning scenarios, and is compatible with most existing pruning approaches.
Researcher Affiliation	Collaboration	1IST Austria 2Neural Magic.
Pseudocode	Yes	Algorithm 1 We efficiently compute the optimal layer-wise sparsity profile with execution time at most T given S, es ℓ, ts ℓand assuming that time is discretized, using bottom-up dynamic programming. ... Algorithm 2 Collect layer-wise timings ts ℓ. ... Algorithm 3 Generate reconstruction database entries W s ℓ. ... Algorithm 4 SPDY search for optimal sensitivity values c .
Open Source Code	Yes	We provide efficient implementations of our methods at https://github.com/IST-DASLab/spdy.
Open Datasets	Yes	For experiments on Image Net (Deng et al., 2009) we follow (Hubara et al., 2021a), by defining the calibration set for AP, g AP and the profile search to contain exactly one randomly-selected training image per class. For other tasks, we select 1000 training samples at random for the calibration set. ... YOLOv5 (Jocher, 2022.) object detector, and the widely used BERTbase for language modelling (Devlin et al., 2019) on the SQu AD dataset (Rajpurkar et al., 2016).
Dataset Splits	Yes	The actual quality of this profile is then determined by stitching together layers from the reconstruction database (see Section 3.4) and computing the loss of the composite model on a small calibration set. In practice, we use the same data for validation as for the AP; similar to (Hubara et al., 2021a), we do not observe any overfitting. ... For experiments on Image Net (Deng et al., 2009) we follow (Hubara et al., 2021a), by defining the calibration set for AP, g AP and the profile search to contain exactly one randomly-selected training image per class.
Hardware Specification	Yes	This is executed on a single NVIDIA 3090 GPU, and can be significantly optimized. ... Layer-wise timings for the AMD system are collected on an Amazon AWS c5a.8xlarge machine with 16 cores, while for Intel CPUs we use a c5.9xlarge server with 18 cores. ... on an AWS c5.12xlarge instance.
Software Dependencies	Yes	We measure speedups and execute inference on the publicly-available Deep Sparse v0.9.1 CPU inference engine (Neural Magic, 2021; Kurtz et al., 2020), which is competitive when executing dense models with the standard ONNX and Open VINO runtimes...
Experiment Setup	Yes	In all our experiments, we use the same set of sparsity targets for each layer S = {0} {1 (1 0.4) δi \| i = 0, . . . , 40} with δ = ((1 0.99)/(1 0.4))1/40. ... For time discretization, we always use B = 104 buckets as individual units of time. ... The reconstruction database generation performs 10 epochs of optimization over this calibration set, using Adam (Kingma & Ba, 2015) with batchsize 32 and learning rate 10 3 per sparsity level while g AP runs for 100 epochs with learning rate 10 5 and frozen batch norms. ... The listed speedups are for batchsize 64, except for BERT (Devlin et al., 2019), which uses batchsize 16.