Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Bayesian Nonparametric Covariance Regression
Authors: Emily B. Fox, David B. Dunson
JMLR 2015 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | A number of simulation studies are examined in Section 4, with an application to the Google Flu Trends data set presented in Section 5. We assess the performance of the proposed approach in terms of both covariance estimation and predictive performance. In Case 1 we simulated from the proposed model, while in Case 2 we simulated from a parametric model. |
| Researcher Affiliation | Academia | Emily B. Fox EMAIL Department of Statistics University of Washington Seattle, WA 98195-4322, USA David B. Dunson EMAIL Department of Statistical Science Duke University Durham, NC 27708-0251, USA |
| Pseudocode | Yes | Based on a fixed truncation level L and a latent factor dimension k, we propose a Gibbs sampler for posterior computation. For the model of Section 2, the full joint probability is given by pobs pparams phypers where... The resulting sampler is outlined in Steps 1-5 below. Step 1 is derived in Appendix B. In this section, we equivalently represent the latent factor process of (2) as ηi = ψ(xi) + νi, with νi Nk(0, Ik). Step 1. Update each basis function ξℓm from the conditional posterior given {yi}, Θ, {ηi}, Σ0. ... Step 6. Finally, for the hyperparameters in the shrinkage prior for Θ, we have... |
| Open Source Code | No | The paper does not contain any explicit statement about making source code available, nor does it provide a link to a code repository. |
| Open Datasets | Yes | As a motivating example, we focus on the problem of modeling the changing correlations in flu activity amongst a large collection of regions in the United States as a function of time. The Google Flu Trends data set (available at http://www.google.org/flutrends/) provides estimates of flu activity in 183 regions on a weekly basis. |
| Dataset Splits | Yes | To generate a hold out sample, we removed 48 of the 1,000 observations by deleting observations yij with probability pi, where pi was chosen to vary with xi to slightly favor removal in regions with more concentrated conditional response distributions. ... More specifically, from the available observations (omitting the significant number of truly missing observations), we randomly held out 10% of the values uniformly across time and regions. ... We run this experiment of randomly holding out 10% of the observed data twice. |
| Hardware Specification | Yes | Each of our chains of 10,000 Gibbs iterations based on a naive implementation in MATLAB (R2010b) took approximately 12 hours on a machine with four Intel Xeon X5550 Quad Core 2.67GHz processors and 48 GB of RAM. |
| Software Dependencies | Yes | Each of our chains of 10,000 Gibbs iterations based on a naive implementation in MATLAB (R2010b) took approximately 12 hours on a machine with four Intel Xeon X5550 Quad Core 2.67GHz processors and 48 GB of RAM. |
| Experiment Setup | Yes | In Case 1, we let X = {1, . . . , 100}, p = 10, L = 5, k = 4, a1 = a2 = 10, γ = 3, aσ = 1, bσ = 0.1 and κψ = κ = 10 in the Gaussian process after scaling X to (0, 1] with an additional nugget of 1e 5In added to K. ... We set a1 = a2 = 2, γ = 3, and placed a Ga(1, 0.1) prior on the precision parameters σ 2 j . The length-scale parameter κ was set from the data according to the heuristic described in Appendix C, and was determined to be 10 (after rounding). Details on initialization are available in Appendix D. We simulated 10,000 Gibbs iterations, discarded the first 5,000 and saved every 10th iteration. ... We simulated 5 chains each for 10,000 MCMC iterations, discarded the first 5,000 for burn-in, and thinned the chains by examining every 10 samples. Each chain was initialized with parameters sampled from the prior. The hyperparameters were set as in the simulation study, except with larger truncation levels L = 10 and k = 20 and with the Gaussian process length-scale hyperparameter set to κψ = κ = 100 to account for the time scale (weeks) and the rate at which ILI incidences change. |