Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Low-rank Variational Bayes correction to the Laplace method

Authors: Janet van Niekerk, Haavard Rue

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper we propose a hybrid approximate method called Low-Rank Variational Bayes correction (VBC), that uses the Laplace method and subsequently a Variational Bayes correction in a lower dimension, to the joint posterior mean. We illustrate this convergence using simulated and real examples, and we compare the posterior from a Gaussian approximation with the VB correction, to the posterior from MCMC samples in Section 3. We simulate two samples from the proposed model, one of size n = 20 and another of size n = 100, and the data are presented in Figure 1 (left). The posterior means for the Laplace method, MCMC, HMC and the VBC methods are presented in Table 1 for the latent ﬁeld and selected linear predictors. We consider two real data examples of diﬀerent sized data sets.
Researcher Affiliation	Academia	Janet van Niekerk janet.van EMAIL Statistics Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia H avard Rue EMAIL Statistics Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
Pseudocode	Yes	Our proposal to approximate the joint posterior can be summarized as follows: 1. Calculate the gradient γ, and the negative Hessian matrix H, of log π(ψ\|y). 2. Find the MAP estimator by solving for ψ0 such that H\|ψ=ψ0ψ0 = γ\|ψ=ψ0 + H\|ψ=ψ0ψ0, and deﬁne Q0 = H\|ψ=ψ0 and b0 = γ\|ψ=ψ0 + H\|ψ=ψ0ψ0. 3. Decide on the set of indices for correction, I, construct the p m matrix Q 1 I from the columns of the inverse of Q0, Q 1 0 , and solve for λ such that λ = arg min λ h Eψ N(ψ0+Q 1 I λ,Q 1 0 )[ log π(y\|ψ)] + KLD φ(ψ\|ψ0 + Q 1 I λ,Q 1 0 )\|\|π(ψ) i . 4. The approximate posterior of ψ is Gaussian with mean ψ1 = ψ0 +Q 1 I λ and precision matrix Q0.
Open Source Code	Yes	The examples presented herein can be reproduced based on the code available at https: //github.com/Janet VN1201/Code_for_papers/tree/main/Low-rank%20VB%20correction% 20to%20GA.
Open Datasets	Yes	We simulate two samples from the proposed model, one of size n = 20 and another of size n = 100, and the data are presented in Figure 1 (left). We simulate a sample of n = 1000 counts and the data is presented in Figure 2 (left). The Tokyo data set (Rue and Held, 2005) in the R-INLA library contains information on the number of times the daily rainfall measurements in Tokyo was more than 1mm on a speciﬁc day t for two consecutive years. Consider the R data set Leuk that features the survival times of 1043 patients with acute myeloid leukemia (AML) in Northwest England between 1982 to 1998, for more details see Henderson et al. (2002).
Dataset Splits	No	The paper uses simulated datasets and references two real-world datasets (Tokyo and Leuk). While it provides details on the size and origin of these datasets, it does not explicitly mention any training, validation, or test splits. For the Leuk dataset, it refers to an 'augmented data set of size 11738', but this is not a data split for experimental evaluation purposes.
Hardware Specification	No	The paper discusses computational efficiency and runtime comparisons between methods (e.g., 'The time for all methods were less than 6 seconds...', 'the excessive computational time for MCMC and HMC is clear'), but it does not specify any particular hardware components such as CPU, GPU models, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions several software tools and libraries: 'a Gibbs sampler (using the runjags library) and HMC (using Stan)' and the 'R-INLA library' which includes the 'inla function'. However, specific version numbers for any of these software dependencies are not provided.
Experiment Setup	Yes	We use β0 = 1, β1 = 0.5 and a continuous covariate x, simulated as x N(0, 1). The overdispersion is simulated as ui N(0, 0.25). We assume the following illustrative priors, β0 t(5), β1 U( 3, 3) and u N(0, 0.25I) i.e. β0 follows a Student s t prior with 5 degrees of freedom, β1 follows a uniform distribution in ( 3, 3) and the random eﬀects are independent and identically distributed with a ﬁxed marginal precision of 4. We used a Gibbs sampler (using the runjags library) and HMC (using Stan) with a burn-in of 102 and a sample of size 105, as the gold standard... For the MCMC we used a Gibbs sampler with a burn-in of 103 and a sample of size 105. For Section 5.1, 'we fix the hyperparameter τ = 1'.