Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Flexible Signal Denoising via Flexible Empirical Bayes Shrinkage
Authors: Zhengrong Xing, Peter Carbonetto, Matthew Stephens
JMLR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show through empirical comparisons that the results are competitive with other methods, including both simple thresholding rules and purpose-built empirical Bayes procedures. Our methods are implemented in the R package smashr, SMoothing by Adaptive SHrinkage in R, available at https://www.github.com/stephenslab/smashr. We have conducted a wide range of numerical experiments to compare SMASH against the existing methods for wavelet-based signal denoising. Before presenting the results from these experiments (Section 4.2), we first illustrate the features of SMASH in a small example (Section 4.1). |
| Researcher Affiliation | Academia | Zhengrong Xing EMAIL Department of Statistics University of Chicago Chicago, IL 60637, USA Peter Carbonetto EMAIL Research Computing Center and Department of Human Genetics University of Chicago Chicago, IL 60637, USA Matthew Stephens EMAIL Department of Statistics and Department of Human Genetics University of Chicago Chicago, IL 60637, USA |
| Pseudocode | No | The paper describes algorithms textually, for example, 'The algorithm consists of repeating the following two steps: 1. Estimate µ as if σ2 is known (with σ2 set to the estimate ˆσ2 obtained from the previous iteration). 2. Estimate σ2 as if µ is known (with µ set to the estimate ˆµ2 obtained from Step 1).' However, it does not provide any formally structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our methods are implemented in the R package smashr, SMoothing by Adaptive SHrinkage in R, available at https://www.github.com/stephenslab/smashr. |
| Open Datasets | Yes | Here we demonstrate application of SMASH to the motorcycle acceleration data set from Silverman (1985). The data are Ch IP-seq read counts for transcription factor YY1 in cell line GM12878 from the ENCODE project ( Encyclopedia of DNA Elements ; ENCODE Project Consortium, 2011; Dunham et al., 2012; Sloan et al., 2016; Gertz et al., 2013; Landt et al., 2012). |
| Dataset Splits | No | For each combination of simulation settings, we simulated 100 data sets, each with a signal of length T = 1,024, and applied the signal denoising methods to each of the simulated data sets. For each scenario, we simulated 100 data sets. For each combination of test function and intensity range, we simulated 100 data sets, each with a signal of length T = 1,024. The paper describes the simulation of multiple independent datasets and their sizes, but does not provide details on splitting a single dataset into training, validation, and test sets for model evaluation or reproduction. |
| Hardware Specification | Yes | This timing is based on running R 3.4.3 on a Mac Book Pro with a 3.5 GHz Intel i7 multicore CPU and no multithreaded external BLAS/LAPACK libraries. |
| Software Dependencies | Yes | Our methods are implemented in the R package smashr, SMoothing by Adaptive SHrinkage in R, available at https://www.github.com/stephenslab/smashr. This timing is based on running R 3.4.3 on a Mac Book Pro with a 3.5 GHz Intel i7 multicore CPU and no multithreaded external BLAS/LAPACK libraries. Using the convex optimization library MOSEK (Friberg, 2017), which is interfaced through the KWDual function in the R package REBayes (Koenker and Gu, 2017), fitting the ASH model typically takes about 30 seconds or less for a data set with 100,000 observations. These are implemented in the R package wavethresh (Nason, 2016), for example. |
| Experiment Setup | Yes | The algorithm consists of repeating the following two steps: 1. Estimate µ as if σ2 is known (with σ2 set to the estimate ˆσ2 obtained from the previous iteration). 2. Estimate σ2 as if µ is known (with µ set to the estimate ˆµ2 obtained from Step 1). To initialize the algorithm... we found that two iterations of steps 1 and 2 reliably produced accurate results. (So the full procedure consists of initialization, running steps 1 and 2, then running steps 1 and 2 a second time.) For all results shown in the figures and tables below, the methods used the Symmlet8 wavelet basis (Daubechies, 1992). |