Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Bayesian Nonparametric Covariance Regression

Authors: Emily B. Fox, David B. Dunson

JMLR 2015 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	A number of simulation studies are examined in Section 4, with an application to the Google Flu Trends data set presented in Section 5. We assess the performance of the proposed approach in terms of both covariance estimation and predictive performance. In Case 1 we simulated from the proposed model, while in Case 2 we simulated from a parametric model.
Researcher Affiliation	Academia	Emily B. Fox EMAIL Department of Statistics University of Washington Seattle, WA 98195-4322, USA David B. Dunson EMAIL Department of Statistical Science Duke University Durham, NC 27708-0251, USA
Pseudocode	Yes	Based on a fixed truncation level L and a latent factor dimension k, we propose a Gibbs sampler for posterior computation. For the model of Section 2, the full joint probability is given by pobs pparams phypers where... The resulting sampler is outlined in Steps 1-5 below. Step 1 is derived in Appendix B. In this section, we equivalently represent the latent factor process of (2) as ηi = ψ(xi) + νi, with νi Nk(0, Ik). Step 1. Update each basis function ξℓm from the conditional posterior given {yi}, Θ, {ηi}, Σ0. ... Step 6. Finally, for the hyperparameters in the shrinkage prior for Θ, we have...
Open Source Code	No	The paper does not contain any explicit statement about making source code available, nor does it provide a link to a code repository.
Open Datasets	Yes	As a motivating example, we focus on the problem of modeling the changing correlations in flu activity amongst a large collection of regions in the United States as a function of time. The Google Flu Trends data set (available at http://www.google.org/flutrends/) provides estimates of flu activity in 183 regions on a weekly basis.
Dataset Splits	Yes	To generate a hold out sample, we removed 48 of the 1,000 observations by deleting observations yij with probability pi, where pi was chosen to vary with xi to slightly favor removal in regions with more concentrated conditional response distributions. ... More specifically, from the available observations (omitting the significant number of truly missing observations), we randomly held out 10% of the values uniformly across time and regions. ... We run this experiment of randomly holding out 10% of the observed data twice.
Hardware Specification	Yes	Each of our chains of 10,000 Gibbs iterations based on a naive implementation in MATLAB (R2010b) took approximately 12 hours on a machine with four Intel Xeon X5550 Quad Core 2.67GHz processors and 48 GB of RAM.
Software Dependencies	Yes	Each of our chains of 10,000 Gibbs iterations based on a naive implementation in MATLAB (R2010b) took approximately 12 hours on a machine with four Intel Xeon X5550 Quad Core 2.67GHz processors and 48 GB of RAM.
Experiment Setup	Yes	In Case 1, we let X = {1, . . . , 100}, p = 10, L = 5, k = 4, a1 = a2 = 10, γ = 3, aσ = 1, bσ = 0.1 and κψ = κ = 10 in the Gaussian process after scaling X to (0, 1] with an additional nugget of 1e 5In added to K. ... We set a1 = a2 = 2, γ = 3, and placed a Ga(1, 0.1) prior on the precision parameters σ 2 j . The length-scale parameter κ was set from the data according to the heuristic described in Appendix C, and was determined to be 10 (after rounding). Details on initialization are available in Appendix D. We simulated 10,000 Gibbs iterations, discarded the ﬁrst 5,000 and saved every 10th iteration. ... We simulated 5 chains each for 10,000 MCMC iterations, discarded the ﬁrst 5,000 for burn-in, and thinned the chains by examining every 10 samples. Each chain was initialized with parameters sampled from the prior. The hyperparameters were set as in the simulation study, except with larger truncation levels L = 10 and k = 20 and with the Gaussian process length-scale hyperparameter set to κψ = κ = 100 to account for the time scale (weeks) and the rate at which ILI incidences change.