Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Fast Expectation Propagation for Heteroscedastic, Lasso-Penalized, and Quantile Regression

Authors: Jackson Zhou, John T. Ormerod, Clara Grazian

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 4, a set of experiments is conducted to evaluate the performance of the proposed EP implementation on smaller datasets; this is compared to that of other popular ABI methods, including a more conventional EP implementation. In Section 5, a similar set of experiments is conducted using a big data example.
Researcher Affiliation Academia Jackson Zhou EMAIL School of Mathematics and Statistics University of Sydney NSW 2006, Australia John T. Ormerod EMAIL School of Mathematics and Statistics University of Sydney NSW 2006, Australia Clara Grazian EMAIL School of Mathematics and Statistics University of Sydney and ARC Training Centre in Data Analytics for Resources & Environments NSW 2006, Australia
Pseudocode Yes Algorithm 1 Gaussian power EP with damping and dimension reduction, assuming all sites can be reduced in dimension. Require: D, K, η, α, [dk, f k]K k=1 Algorithm 2 MFVB for lasso-penalized linear regression. Require: eµ, eΣ, ed, and Eq [exp( 2κ)] Algorithm 3 MFVB for quantile linear regression. Require: eµ, eΣ, ed, Eq [exp( κ)] , Eq [exp( 2κ)]
Open Source Code Yes The code for these experiments can be found at https://github.com/jackson-zhou-sydney/EP-multicomp.
Open Datasets Yes The benchmark datasets used were Food (n = 40, p1 = 2, p2 = 2), Salary (n = 725, p1 = 6, p2 = 2), and Sniffer (n = 125, p1 = 5, p2 = 5). Food (Hill et al., 2018) records the food expenditure (the response) and income of various households. Salary (De Maris, 2007) contains salary data (the response) on a large number of university faculty members, along with predictors such as prior experience and years worked. Both Food and Salary are used as illustrative examples in the documentation for Stata s hetregress function. Sniffer (Weisberg, 2013) records the amount of hydrocarbon emitted from a tank when gasoline is poured in (the response), along with the temperature and pressure of the tank and gasoline as predictors. The benchmark datasets used were Diabetes (Efron et al., 2004), Prostate (Hodge et al., 1989), and Eye (n = 120, p = 201). The benchmark datasets used were Ig G (Isaacs et al., 1983), Engel (n = 235, p = 2), and Stack (n = 21, p = 4). For all three models, the benchmark dataset used was Energy (Candanedo 2017; n = 19735), which records appliance energy usage (the response) across households, given temperature and humidity predictors.
Dataset Splits Yes The lppd metric was evaluated out-of-sample using a train-test split of 87.5-12.5, where S = 500 samples from the method to be evaluated were taken (across many repetitions this is comparable to cross-validation). For all three models, the benchmark dataset used was Energy (Candanedo 2017; n = 19735), which records appliance energy usage (the response) across households, given temperature and humidity predictors. The fixed training and test sets contained n = 17, 268 and n = 2467 observations respectively.
Hardware Specification Yes All experiments were executed across 10 computational cores running at 2.5 GHz each, with a combined 32 gigabytes of random access memory.
Software Dependencies No Both versions of EP were implemented in C++ via the R package Rcpp (Eddelbuettel and Fran cois, 2011) to facilitate fairer comparisons with other methods that also were implemented in C++, and parallel computing was handled using Open MP (Chandra et al., 2001). We set η = 0.5 based on Seeger et al. (2007), α = 0.5 as a balanced damping value that is unlikely to result in either improper site parameters (α too high) or slow convergence (α too low), and choose 400 as the number of quadrature points in each dimension. Refinements to the site approximations were grouped into different passes through the data, so as to ensure that all site approximations are updated an equal number of times. In each pass, all site approximations are refined exactly once, with the order of these refinements randomized. For each refinement, the norms of the differences in natural parameters between the old and updated site approximations were calculated, and maximums were taken across sites for each combination of pass and natural parameter. Convergence was determined to be satisfied after a given pass (with the algorithm terminated) if each of these maximum absolute deltas was less than 0.05 times the corresponding values of the first pass. The minimum and maximum number of passes was set to 6 and 200 respectively. It was always the case that EP converged. Alternate ABI methods were also implemented for each model. For heteroscedastic linear regression, the derivatives of the joint likelihood are able to be evaluated and the Laplace (coded as Laplace in the text and as LA in the figures/tables) and Pathfinder (Zhang et al., 2022) approximations were considered; both have multivariate Gaussian forms. The Laplace approximation was implemented using the L-BFGS algorithm with a maximum iteration count of 20,000. Convergence was determined to be satisfied if the difference between the current and new function values was less than 10 6 times the current function value, or if the norm of the current gradient was less than 10 5 times the maximum of one and the norm of the current solution. It was always the case that the L-BFGS optimization converged. The Pathfinder approximation was run with three different values for the number of samples returned. These were 100 (the default number of samples returned), 1000, and 10,000, and were coded as Pathfinder-A, Pathfinder-B, and Pathfinder-C in the text, and as PA, PB, and PC in the figures/tables respectively. The remaining settings for Pathfinder, including those related to optimization, were left as their suggested default values in the paper, where it is noted that Pathfinder results are not sensitive to its settings. Note that the optimizations need not converge for Pathfinder to give sensible results, and so convergence was not checked. The derivatives of the joint likelihood are not always defined for lasso-penalized and quantile linear regression, and so a mean-field variational Bayes (coded as MFVB in the text and as MF in the figures/tables) approximation was implemented instead; the derivations can be found in Appendix D and Appendix E respectively. Convergence was determined to be satisfied after a given iteration of the while loop (with the algorithm terminated) if the norms of the differences between the current and new values of all parameters were less than 0.05 times the corresponding values of the first iteration. The minimum and maximum number of iterations was set to 6 and 200 respectively, to match that of EP. It was always the case that MFVB converged. The Laplace and MFVB approximations were implemented in C++ via the R package Rcpp (Eddelbuettel and Fran cois, 2011) to facilitate fair comparisons, and the Pathfinder R code was obtained from https://github.com/Lu Zhangstat/Pathfinder. We expect Pathfinder run times to be comparable to a full C++ implementation since the computationally intensive optimization stage is already performed in C++ via the R package optimx (Nash and Varadhan, 2011).
Experiment Setup Yes For each of 10 chains, we set 1000 warm-up iterations and 10,000 sampling iterations, for a total of 10,000 warm-up and 100,000 sampling iterations. Convergence was verified by checking that b R < 1.1, as per the recommendation in Gelman et al. (1995); this was always the case. Four common methods were implemented across all three models in the experiments. These were a long run of MCMC, a short run of MCMC, the proposed EP implementation, and an alternate EP implementation where bivariate numerical quadrature was used to evaluate the moments of the low-dimensional tilted distributions. These are coded as MCMC, MCMC-S, EP-1D, and EP-2D in the text, and as ML, MS, E1, and E2 in the figures/tables respectively. The long run of MCMC used the same settings as the gold standard (but a different seed), and was used to provide a rough upper bound on performance. Note that it is sometimes possible for other methods to outperform the gold standard, especially for sampling-based evaluation metrics which do not use the full sample. For the short run of MCMC, the number of sampling iterations per chain was minimized within the set {100, 200, 400, 1000, 2000, 4000, 6000, 8000, 10000}, such that b R < 1.1. For each chain, the number of warm-up iterations was always one-tenth the number of sampling iterations. Both versions of EP were implemented in C++ via the R package Rcpp (Eddelbuettel and Fran cois, 2011) to facilitate fairer comparisons with other methods that also were implemented in C++, and parallel computing was handled using Open MP (Chandra et al., 2001). We set η = 0.5 based on Seeger et al. (2007), α = 0.5 as a balanced damping value that is unlikely to result in either improper site parameters (α too high) or slow convergence (α too low), and choose 400 as the number of quadrature points in each dimension. Refinements to the site approximations were grouped into different passes through the data, so as to ensure that all site approximations are updated an equal number of times. In each pass, all site approximations are refined exactly once, with the order of these refinements randomized. For each refinement, the norms of the differences in natural parameters between the old and updated site approximations were calculated, and maximums were taken across sites for each combination of pass and natural parameter. Convergence was determined to be satisfied after a given pass (with the algorithm terminated) if each of these maximum absolute deltas was less than 0.05 times the corresponding values of the first pass. The minimum and maximum number of passes was set to 6 and 200 respectively. It was always the case that EP converged. Alternate ABI methods were also implemented for each model. For heteroscedastic linear regression, the derivatives of the joint likelihood are able to be evaluated and the Laplace (coded as Laplace in the text and as LA in the figures/tables) and Pathfinder (Zhang et al., 2022) approximations were considered; both have multivariate Gaussian forms. The Laplace approximation was implemented using the L-BFGS algorithm with a maximum iteration count of 20,000. Convergence was determined to be satisfied if the difference between the current and new function values was less than 10 6 times the current function value, or if the norm of the current gradient was less than 10 5 times the maximum of one and the norm of the current solution. It was always the case that the L-BFGS optimization converged. The Pathfinder approximation was run with three different values for the number of samples returned. These were 100 (the default number of samples returned), 1000, and 10,000, and were coded as Pathfinder-A, Pathfinder-B, and Pathfinder-C in the text, and as PA, PB, and PC in the figures/tables respectively. The remaining settings for Pathfinder, including those related to optimization, were left as their suggested default values in the paper, where it is noted that Pathfinder results are not sensitive to its settings. Note that the optimizations need not converge for Pathfinder to give sensible results, and so convergence was not checked. The derivatives of the joint likelihood are not always defined for lasso-penalized and quantile linear regression, and so a mean-field variational Bayes (coded as MFVB in the text and as MF in the figures/tables) approximation was implemented instead; the derivations can be found in Appendix D and Appendix E respectively. Convergence was determined to be satisfied after a given iteration of the while loop (with the algorithm terminated) if the norms of the differences between the current and new values of all parameters were less than 0.05 times the corresponding values of the first iteration. The minimum and maximum number of iterations was set to 6 and 200 respectively, to match that of EP. It was always the case that MFVB converged.