Optimal Ridge Regularization for Out-of-Distribution Prediction
Authors: Pratik Patil, Jin-Hong Du, Ryan Tibshirani
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The numerical illustrations in Figure 2 demonstrate the results of Theorems 3.3 and 3.4. As shown, while the optimal ridge penalties λ for the IND prediction risks are positive, the OOD prediction risk can be negative and approach its lower limit. Similar observations also occur in real-world MNIST datasets (see Table 2 and Appendix G.1.2 for the experimental details). |
| Researcher Affiliation | Academia | 1Department of Statistics, University of California, Berkeley, CA 94720, USA. 2Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA. 3Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA. |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks; the methods are described mathematically and through theorems. |
| Open Source Code | Yes | The source code for generating all of our figures is included in the supplementary material in the folder named code, along with details about the computational resources used and other relevant information. |
| Open Datasets | Yes | Similar observations also occur in real-world MNIST datasets (see Table 2 and Appendix G.1.2 for the experimental details). |
| Dataset Splits | No | For training, we randomly select n = 64 images, and for testing, we hold out 10,000 images. |
| Hardware Specification | No | We also thank the computing support provided by the ACCESS allocation MTH230020 for some of the experiments performed on the Bridges2 system at the Pittsburgh Supercomputing Center. |
| Software Dependencies | No | The source code for generating all of our figures is included in the supplementary material in the folder named code, along with details about the computational resources used and other relevant information. |
| Experiment Setup | Yes | For training, we randomly select n = 64 images, and for testing, we hold out 10,000 images. The response variable y represents the digit value ranging from 0 to 9. Our model includes an intercept term, which is not subject to penalization. To estimate the expected out-of-distribution risk, we average the risks across 100 random samples from the training set. |