Bayes beats Cross Validation: Efficient and Accurate Ridge Regression via Expectation Maximization
Authors: Shu Yu Tew, Mario Boley, Daniel Schmidt
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present numerical results on both synthetic and real-world datasets. To implement the LOOCV estimator, we use a predefined grid, L = (λ1, . . . , λl). We use the two most common methods for this task: (i) fixed grid arbitrarily selecting a very small value as λmin, a large value as λmax, and construct a sequence of l values from λmax to λmin on log scale; (ii) data-driven grid find the smallest value of λmax that sets all the regression coefficient vector to zero 2 (i.e. ˆβ = 0), multiply this value by a ratio such that λmin = κλmax and create a sequence from λmax to λmin on log scale. The latter method is implemented in the glmnet package in combination with an adaptive κ coefficient |
| Researcher Affiliation | Academia | Shu Yu Tew Monash University shu.tew@monash.edu Mario Boley Monash University mario.boley@monash.edu Daniel F. Schmidt Monash University daniel.schmidt@monash.edu |
| Pseudocode | Yes | All time complexities are summarized in Tab. 1 and detailed pseudocode for both the fast EM algorithm and the fast LOOCV algorithm is provided in the Appendix (see Table 3 and 4). |
| Open Source Code | Yes | Our implementation of both algorithms, along with all experiment code, are publicly available in the standard package ecosystems of the R and Python platforms, as well as on Git Hub1. 1https://github.com/marioboley/fastridge.git |
| Open Datasets | Yes | We evaluated our EM method on 24 real-world datasets. This includes 21 datasets from the UCI machine learning repository [5] (unless referenced otherwise) for normal linear regression tasks and 3 time-series datasets from the UCR repository [10] for multitarget regression tasks. |
| Dataset Splits | Yes | For each experiment, we repeated the process 100 times and used a random 70/30 train-test split. Due to memory limitations, we limit our design matrix size to a maximum of 35 million entries. If the number of transformed predictors exceeded this limit, we uniformly sub-sampled the interaction variables to ensure that p 35000000/(0.7n), and then fit the model using the sampled variables. Note that we always keep the original variables (main effects) and sub-sampled the interactions. In the case of multitarget regression, we performed a random 70/30 train-test split and repeated the experiment 30 times. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used for running its experiments. It mentions using R and Python platforms but provides no specific CPU, GPU, or other hardware details. |
| Software Dependencies | No | The paper mentions using "scikit-learn" and the "glmnet package" for LOOCV, and states experiments were performed in "Python and the R statistical platform". However, it does not provide specific version numbers for any of these software components. |
| Experiment Setup | Yes | Our EM algorithm does not require a predefined penalty grid, but it needs a convergence threshold which we set to be ϵ = 10 8. All experiments in this section are performed in Python and the R statistical platform. ... We consider a fixed grid of λ = (10 10, . . . , 1010) and the grid based on the glmnet heuristic; in both cases, we use a sequence of length 100. ... For each experiment, we repeated the process 100 times and used a random 70/30 train-test split. |