Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Flexible Signal Denoising via Flexible Empirical Bayes Shrinkage

Authors: Zhengrong Xing, Peter Carbonetto, Matthew Stephens

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show through empirical comparisons that the results are competitive with other methods, including both simple thresholding rules and purpose-built empirical Bayes procedures. Our methods are implemented in the R package smashr, SMoothing by Adaptive SHrinkage in R, available at https://www.github.com/stephenslab/smashr. We have conducted a wide range of numerical experiments to compare SMASH against the existing methods for wavelet-based signal denoising. Before presenting the results from these experiments (Section 4.2), we ﬁrst illustrate the features of SMASH in a small example (Section 4.1).
Researcher Affiliation	Academia	Zhengrong Xing EMAIL Department of Statistics University of Chicago Chicago, IL 60637, USA Peter Carbonetto EMAIL Research Computing Center and Department of Human Genetics University of Chicago Chicago, IL 60637, USA Matthew Stephens EMAIL Department of Statistics and Department of Human Genetics University of Chicago Chicago, IL 60637, USA
Pseudocode	No	The paper describes algorithms textually, for example, 'The algorithm consists of repeating the following two steps: 1. Estimate µ as if σ2 is known (with σ2 set to the estimate ˆσ2 obtained from the previous iteration). 2. Estimate σ2 as if µ is known (with µ set to the estimate ˆµ2 obtained from Step 1).' However, it does not provide any formally structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our methods are implemented in the R package smashr, SMoothing by Adaptive SHrinkage in R, available at https://www.github.com/stephenslab/smashr.
Open Datasets	Yes	Here we demonstrate application of SMASH to the motorcycle acceleration data set from Silverman (1985). The data are Ch IP-seq read counts for transcription factor YY1 in cell line GM12878 from the ENCODE project ( Encyclopedia of DNA Elements ; ENCODE Project Consortium, 2011; Dunham et al., 2012; Sloan et al., 2016; Gertz et al., 2013; Landt et al., 2012).
Dataset Splits	No	For each combination of simulation settings, we simulated 100 data sets, each with a signal of length T = 1,024, and applied the signal denoising methods to each of the simulated data sets. For each scenario, we simulated 100 data sets. For each combination of test function and intensity range, we simulated 100 data sets, each with a signal of length T = 1,024. The paper describes the simulation of multiple independent datasets and their sizes, but does not provide details on splitting a single dataset into training, validation, and test sets for model evaluation or reproduction.
Hardware Specification	Yes	This timing is based on running R 3.4.3 on a Mac Book Pro with a 3.5 GHz Intel i7 multicore CPU and no multithreaded external BLAS/LAPACK libraries.
Software Dependencies	Yes	Our methods are implemented in the R package smashr, SMoothing by Adaptive SHrinkage in R, available at https://www.github.com/stephenslab/smashr. This timing is based on running R 3.4.3 on a Mac Book Pro with a 3.5 GHz Intel i7 multicore CPU and no multithreaded external BLAS/LAPACK libraries. Using the convex optimization library MOSEK (Friberg, 2017), which is interfaced through the KWDual function in the R package REBayes (Koenker and Gu, 2017), ﬁtting the ASH model typically takes about 30 seconds or less for a data set with 100,000 observations. These are implemented in the R package wavethresh (Nason, 2016), for example.
Experiment Setup	Yes	The algorithm consists of repeating the following two steps: 1. Estimate µ as if σ2 is known (with σ2 set to the estimate ˆσ2 obtained from the previous iteration). 2. Estimate σ2 as if µ is known (with µ set to the estimate ˆµ2 obtained from Step 1). To initialize the algorithm... we found that two iterations of steps 1 and 2 reliably produced accurate results. (So the full procedure consists of initialization, running steps 1 and 2, then running steps 1 and 2 a second time.) For all results shown in the ﬁgures and tables below, the methods used the Symmlet8 wavelet basis (Daubechies, 1992).