Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Improved Shrinkage Prediction under a Spiked Covariance Structure

Authors: Trambak Banerjee, Gourab Mukherjee, Debashis Paul

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present simulation experiments as well as real data examples illustrating the eﬃcacy of the proposed method.
Researcher Affiliation	Academia	Trambak Banerjee EMAIL Analytics, Information and Operations Management University of Kansas Lawrence, KS 66045, USA Gourab Mukherjee EMAIL Data Sciences and Operations University of Southern California Los Angeles, CA 90089, USA Debashis Paul EMAIL Department of Statistics University of California, Davis Davis, CA 95616, USA
Pseudocode	No	The paper describes its methodology in narrative text and mathematical formulations (e.g., Section 3. Proposed methodology for disaggregated model) but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	The R package casp has been developed to implement our proposed CASP methodology in aggregated as well as disaggregated prediction problems. It is publicly available at the following Git Hub repository: https://github.com/trambakbanerjee/casp.
Open Datasets	Yes	In this section we analyze a part of the dataset published by Bronnenberg et al. (2008).
Dataset Splits	Yes	We use 3 weeks from a relatively recent snapshot covering October 31, 2011 to November 20, 2011 as data from the current model... We use the most recent T = 2 weeks, from November 7, 2011 to November 20, 2011 as our prediction period and utilize the sales data of week t 1 to predict the state aggregated totals for week t where t = 1, . . . , T.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions the use of 'R package casp', 'R package esa Bcv', 'R package FACTMLE', 'R-package POET', and 'R-package splines2' but does not specify their version numbers.
Experiment Setup	Yes	In the setup of experiment 1 we investigate the prediction performance of the ﬁve predictive rules under the disaggregated model (A = In) and sample θ from an n = 200 variate Gaussian distribution with mean vector η0 = 0 and covariance τΣβ. We impose a spike covariance structure on Σ with K = 10 spikes under the following two scenarios with l0 ﬁxed at 1. Scenario 1: we consider the generalized absolute loss function in equation (3) with bi sampled uniformly between (0.9, 0.95), hi = 1 bi with (τ, β) = (0.5, 0.25) and K spikes equi-spaced between 80 and 20. Scenario 2: we consider the Linex loss function in equation (4) with ai sampled uniformly between ( 2, 1), bi = 1 with (τ, β) = (1, 1.75) and K spikes equi-spaced between 25 and 5. For our prediction problem, we use a threshold of sp units for product p and consider only those outlets that have sold at least sp units in week 0. In particular, we use the function smooth.spline from the R-package splines2 and choose k = 3 knots corresponding to the 25, 50 and 95 percentiles of the sales distribution across the np stores at each of the m weeks.