Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Fast Expectation Propagation for Heteroscedastic, Lasso-Penalized, and Quantile Regression

Authors: Jackson Zhou, John T. Ormerod, Clara Grazian

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Section 4, a set of experiments is conducted to evaluate the performance of the proposed EP implementation on smaller datasets; this is compared to that of other popular ABI methods, including a more conventional EP implementation. In Section 5, a similar set of experiments is conducted using a big data example.
Researcher Affiliation	Academia	Jackson Zhou EMAIL School of Mathematics and Statistics University of Sydney NSW 2006, Australia John T. Ormerod EMAIL School of Mathematics and Statistics University of Sydney NSW 2006, Australia Clara Grazian EMAIL School of Mathematics and Statistics University of Sydney and ARC Training Centre in Data Analytics for Resources & Environments NSW 2006, Australia
Pseudocode	Yes	Algorithm 1 Gaussian power EP with damping and dimension reduction, assuming all sites can be reduced in dimension. Require: D, K, η, α, [dk, f k]K k=1 Algorithm 2 MFVB for lasso-penalized linear regression. Require: eµ, eΣ, ed, and Eq [exp( 2κ)] Algorithm 3 MFVB for quantile linear regression. Require: eµ, eΣ, ed, Eq [exp( κ)] , Eq [exp( 2κ)]
Open Source Code	Yes	The code for these experiments can be found at https://github.com/jackson-zhou-sydney/EP-multicomp.
Open Datasets	Yes	The benchmark datasets used were Food (n = 40, p1 = 2, p2 = 2), Salary (n = 725, p1 = 6, p2 = 2), and Sniffer (n = 125, p1 = 5, p2 = 5). Food (Hill et al., 2018) records the food expenditure (the response) and income of various households. Salary (De Maris, 2007) contains salary data (the response) on a large number of university faculty members, along with predictors such as prior experience and years worked. Both Food and Salary are used as illustrative examples in the documentation for Stata s hetregress function. Sniffer (Weisberg, 2013) records the amount of hydrocarbon emitted from a tank when gasoline is poured in (the response), along with the temperature and pressure of the tank and gasoline as predictors. The benchmark datasets used were Diabetes (Efron et al., 2004), Prostate (Hodge et al., 1989), and Eye (n = 120, p = 201). The benchmark datasets used were Ig G (Isaacs et al., 1983), Engel (n = 235, p = 2), and Stack (n = 21, p = 4). For all three models, the benchmark dataset used was Energy (Candanedo 2017; n = 19735), which records appliance energy usage (the response) across households, given temperature and humidity predictors.
Dataset Splits	Yes	The lppd metric was evaluated out-of-sample using a train-test split of 87.5-12.5, where S = 500 samples from the method to be evaluated were taken (across many repetitions this is comparable to cross-validation). For all three models, the benchmark dataset used was Energy (Candanedo 2017; n = 19735), which records appliance energy usage (the response) across households, given temperature and humidity predictors. The fixed training and test sets contained n = 17, 268 and n = 2467 observations respectively.
Hardware Specification	Yes	All experiments were executed across 10 computational cores running at 2.5 GHz each, with a combined 32 gigabytes of random access memory.
Software Dependencies	No	Both versions of EP were implemented in C++ via the R package Rcpp (Eddelbuettel and Fran cois, 2011) to facilitate fairer comparisons with other methods that also were implemented in C++, and parallel computing was handled using Open MP (Chandra et al., 2001). We set η = 0.5 based on Seeger et al. (2007), α = 0.5 as a balanced damping value that is unlikely to result in either improper site parameters (α too high) or slow convergence (α too low), and choose 400 as the number of quadrature points in each dimension. Reﬁnements to the site approximations were grouped into diﬀerent passes through the data, so as to ensure that all site approximations are updated an equal number of times. In each pass, all site approximations are reﬁned exactly once, with the order of these reﬁnements randomized. For each reﬁnement, the norms of the diﬀerences in natural parameters between the old and updated site approximations were calculated, and maximums were taken across sites for each combination of pass and natural parameter. Convergence was determined to be satisﬁed after a given pass (with the algorithm terminated) if each of these maximum absolute deltas was less than 0.05 times the corresponding values of the ﬁrst pass. The minimum and maximum number of passes was set to 6 and 200 respectively. It was always the case that EP converged. Alternate ABI methods were also implemented for each model. For heteroscedastic linear regression, the derivatives of the joint likelihood are able to be evaluated and the Laplace (coded as Laplace in the text and as LA in the ﬁgures/tables) and Pathﬁnder (Zhang et al., 2022) approximations were considered; both have multivariate Gaussian forms. The Laplace approximation was implemented using the L-BFGS algorithm with a maximum iteration count of 20,000. Convergence was determined to be satisﬁed if the diﬀerence between the current and new function values was less than 10 6 times the current function value, or if the norm of the current gradient was less than 10 5 times the maximum of one and the norm of the current solution. It was always the case that the L-BFGS optimization converged. The Pathﬁnder approximation was run with three diﬀerent values for the number of samples returned. These were 100 (the default number of samples returned), 1000, and 10,000, and were coded as Pathﬁnder-A, Pathﬁnder-B, and Pathﬁnder-C in the text, and as PA, PB, and PC in the ﬁgures/tables respectively. The remaining settings for Pathﬁnder, including those related to optimization, were left as their suggested default values in the paper, where it is noted that Pathﬁnder results are not sensitive to its settings. Note that the optimizations need not converge for Pathﬁnder to give sensible results, and so convergence was not checked. The derivatives of the joint likelihood are not always deﬁned for lasso-penalized and quantile linear regression, and so a mean-ﬁeld variational Bayes (coded as MFVB in the text and as MF in the ﬁgures/tables) approximation was implemented instead; the derivations can be found in Appendix D and Appendix E respectively. Convergence was determined to be satisﬁed after a given iteration of the while loop (with the algorithm terminated) if the norms of the diﬀerences between the current and new values of all parameters were less than 0.05 times the corresponding values of the ﬁrst iteration. The minimum and maximum number of iterations was set to 6 and 200 respectively, to match that of EP. It was always the case that MFVB converged. The Laplace and MFVB approximations were implemented in C++ via the R package Rcpp (Eddelbuettel and Fran cois, 2011) to facilitate fair comparisons, and the Pathﬁnder R code was obtained from https://github.com/Lu Zhangstat/Pathﬁnder. We expect Pathﬁnder run times to be comparable to a full C++ implementation since the computationally intensive optimization stage is already performed in C++ via the R package optimx (Nash and Varadhan, 2011).
Experiment Setup	Yes	For each of 10 chains, we set 1000 warm-up iterations and 10,000 sampling iterations, for a total of 10,000 warm-up and 100,000 sampling iterations. Convergence was veriﬁed by checking that b R < 1.1, as per the recommendation in Gelman et al. (1995); this was always the case. Four common methods were implemented across all three models in the experiments. These were a long run of MCMC, a short run of MCMC, the proposed EP implementation, and an alternate EP implementation where bivariate numerical quadrature was used to evaluate the moments of the low-dimensional tilted distributions. These are coded as MCMC, MCMC-S, EP-1D, and EP-2D in the text, and as ML, MS, E1, and E2 in the ﬁgures/tables respectively. The long run of MCMC used the same settings as the gold standard (but a diﬀerent seed), and was used to provide a rough upper bound on performance. Note that it is sometimes possible for other methods to outperform the gold standard, especially for sampling-based evaluation metrics which do not use the full sample. For the short run of MCMC, the number of sampling iterations per chain was minimized within the set {100, 200, 400, 1000, 2000, 4000, 6000, 8000, 10000}, such that b R < 1.1. For each chain, the number of warm-up iterations was always one-tenth the number of sampling iterations. Both versions of EP were implemented in C++ via the R package Rcpp (Eddelbuettel and Fran cois, 2011) to facilitate fairer comparisons with other methods that also were implemented in C++, and parallel computing was handled using Open MP (Chandra et al., 2001). We set η = 0.5 based on Seeger et al. (2007), α = 0.5 as a balanced damping value that is unlikely to result in either improper site parameters (α too high) or slow convergence (α too low), and choose 400 as the number of quadrature points in each dimension. Reﬁnements to the site approximations were grouped into diﬀerent passes through the data, so as to ensure that all site approximations are updated an equal number of times. In each pass, all site approximations are reﬁned exactly once, with the order of these reﬁnements randomized. For each reﬁnement, the norms of the diﬀerences in natural parameters between the old and updated site approximations were calculated, and maximums were taken across sites for each combination of pass and natural parameter. Convergence was determined to be satisﬁed after a given pass (with the algorithm terminated) if each of these maximum absolute deltas was less than 0.05 times the corresponding values of the ﬁrst pass. The minimum and maximum number of passes was set to 6 and 200 respectively. It was always the case that EP converged. Alternate ABI methods were also implemented for each model. For heteroscedastic linear regression, the derivatives of the joint likelihood are able to be evaluated and the Laplace (coded as Laplace in the text and as LA in the ﬁgures/tables) and Pathﬁnder (Zhang et al., 2022) approximations were considered; both have multivariate Gaussian forms. The Laplace approximation was implemented using the L-BFGS algorithm with a maximum iteration count of 20,000. Convergence was determined to be satisﬁed if the diﬀerence between the current and new function values was less than 10 6 times the current function value, or if the norm of the current gradient was less than 10 5 times the maximum of one and the norm of the current solution. It was always the case that the L-BFGS optimization converged. The Pathﬁnder approximation was run with three diﬀerent values for the number of samples returned. These were 100 (the default number of samples returned), 1000, and 10,000, and were coded as Pathﬁnder-A, Pathﬁnder-B, and Pathﬁnder-C in the text, and as PA, PB, and PC in the ﬁgures/tables respectively. The remaining settings for Pathﬁnder, including those related to optimization, were left as their suggested default values in the paper, where it is noted that Pathﬁnder results are not sensitive to its settings. Note that the optimizations need not converge for Pathﬁnder to give sensible results, and so convergence was not checked. The derivatives of the joint likelihood are not always deﬁned for lasso-penalized and quantile linear regression, and so a mean-ﬁeld variational Bayes (coded as MFVB in the text and as MF in the ﬁgures/tables) approximation was implemented instead; the derivations can be found in Appendix D and Appendix E respectively. Convergence was determined to be satisﬁed after a given iteration of the while loop (with the algorithm terminated) if the norms of the diﬀerences between the current and new values of all parameters were less than 0.05 times the corresponding values of the ﬁrst iteration. The minimum and maximum number of iterations was set to 6 and 200 respectively, to match that of EP. It was always the case that MFVB converged.