Optimal Ridge Regularization for Out-of-Distribution Prediction

Authors: Pratik Patil, Jin-Hong Du, Ryan Tibshirani

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The numerical illustrations in Figure 2 demonstrate the results of Theorems 3.3 and 3.4. As shown, while the optimal ridge penalties λ for the IND prediction risks are positive, the OOD prediction risk can be negative and approach its lower limit. Similar observations also occur in real-world MNIST datasets (see Table 2 and Appendix G.1.2 for the experimental details).
Researcher Affiliation Academia 1Department of Statistics, University of California, Berkeley, CA 94720, USA. 2Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA. 3Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
Pseudocode No The paper does not contain any pseudocode or algorithm blocks; the methods are described mathematically and through theorems.
Open Source Code Yes The source code for generating all of our figures is included in the supplementary material in the folder named code, along with details about the computational resources used and other relevant information.
Open Datasets Yes Similar observations also occur in real-world MNIST datasets (see Table 2 and Appendix G.1.2 for the experimental details).
Dataset Splits No For training, we randomly select n = 64 images, and for testing, we hold out 10,000 images.
Hardware Specification No We also thank the computing support provided by the ACCESS allocation MTH230020 for some of the experiments performed on the Bridges2 system at the Pittsburgh Supercomputing Center.
Software Dependencies No The source code for generating all of our figures is included in the supplementary material in the folder named code, along with details about the computational resources used and other relevant information.
Experiment Setup Yes For training, we randomly select n = 64 images, and for testing, we hold out 10,000 images. The response variable y represents the digit value ranging from 0 to 9. Our model includes an intercept term, which is not subject to penalization. To estimate the expected out-of-distribution risk, we average the risks across 100 random samples from the training set.