Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Scalable Approximate MCMC Algorithms for the Horseshoe Prior
Authors: James Johndrow, Paulo Orenstein, Anirban Bhattacharya
JMLR 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The scalability of the algorithm is illustrated in simulations with problem size as large as N = 5, 000 observations and p = 50, 000 predictors, and an application to a genome-wide association study with N = 2, 267 and p = 98, 385. The empirical results also show that the new algorithm yields estimates with lower mean squared error, intervals with better coverage, and elucidates features of the posterior that were often missed by previous algorithms in high dimensions, including bimodality of posterior marginals indicating uncertainty about which covariates belong in the model. |
| Researcher Affiliation | Academia | James Johndrow EMAIL Department of Statistics Stanford University Stanford, CA, 94305, USA; Paulo Orenstein EMAIL Department of Statistics Stanford University Stanford, CA, 94305, USA; Anirban Bhattacharya EMAIL Department of Statistics Texas A& M University College Station, TX 77843-3143, USA |
| Pseudocode | Yes | 2.1. Exact Algorithm ...A blocked Metropolis-within-Gibbs algorithm that targets the exact horseshoe posterior is given by the update rule 1. sample η p(η | ξ, β, σ2) ...2. propose log(ξ ) N(log(ξ), s), accept ξ w.p. p(ξ | η)ξ ...3. sample σ2 | η, ξ Inv Gamma ...4. sample β | η, ξ, σ2 N (W W + (ξ 1D) 1) 1W z, σ2(W W + (ξ 1D) 1) 1 . ...sample u N(0, ξ 1D) and f N(0, IN) independently set v = Wu + f, v = M 1 ξ (z/σ v), set β = σ(u + ξ 1DW v ). |
| Open Source Code | No | The paper does not provide concrete access to source code. While it discusses the methodology and implementation, there is no explicit statement of code release or a link to a code repository. |
| Open Datasets | Yes | The scalability of the algorithm is illustrated in simulations... and an application to a genome-wide association study with N = 2, 267 and p = 98, 385. ...The data consist of N = 2, 267 observations and p = 98, 385 single nucleotide polymorphisms (SNPs) in the genome of maize. These data have been previously studied by Liu et al. (2016) and Zeng and Zhou (2017). Each observation corresponds to a different inbred maize line from the USDA Ames seed bank (Romay et al., 2013). |
| Dataset Splits | No | The paper describes simulation setups where N and p values are sampled or fixed, and mentions iterations and burn-in periods for MCMC. However, it does not provide specific details for splitting datasets (e.g., into training, validation, or test sets) for model evaluation. The focus is on the MCMC algorithm's performance rather than supervised learning with explicit data splits. |
| Hardware Specification | No | Computation was performed on multicore hardware with 12 threads, so matrix multiplications contribute less to the wall clock time than do matrix decompositions, resulting in the lower than expected exponents on N, p. Thus, these estimates are meant to reflect the actual performance on modern multicore hardware. |
| Software Dependencies | No | The paper mentions the "R package mcmcse" without specifying a version number. No other software components are listed with version numbers. |
| Experiment Setup | Yes | For the approximate algorithm; choosing δ is considered in detail in Section 4, along with a complete description of the simulation setup. ...we simulate from (21) with N = 1, 000 and p = 10, 000 for δ = 10 2, 10 3, 10 4, and 10 5. We collect paths of length 20,000 from each simulation after discarding a burn-in of 5,000. ...we use an independent design. In the second simulation study, we use a correlated design with AR-1 structure and autocorrelation 0.9 as described above. ...For each of N = 200, 400, 600, . . . , 2000, we perform ten replicates of the simulation in (21) with p = 20, 000 and δ = 2p 1 = 10 4. We run the approximate algorithm for n = 21, 000 iterations, discarding B = 1, 000 iterations and computing the pathwise average... ...We run the approximate algorithm for 30,000 iterations, discarding 5,000 iterations as burn-in. |