Generalized equivalences between subsampling and ridge regularization
Authors: Pratik Patil, Jin-Hong Du
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the practical implications of our findings on real-world datasets (see Section 6). Figure 1: Heat map of the various generalized risks (estimation risk, training error, prediction risk, out-of-distribution (OOD) prediction risk) of full-ensemble ridge estimators (approximated with M = 100), for varying ridge penalties λ and subsample aspect ratios ψ = p/k on the log-log scale. The data model is described in Appendix F.2 with p = 500, n = 5000, and ϕ = p/n = 0.1. |
| Researcher Affiliation | Academia | Pratik Patil Department of Statistics University of California Berkeley, CA 94720, USA pratikpatil@berkeley.edu Jin-Hong Du Department of Statistics and Data Science & Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213, USA jinhongd@andrew.cmu.edu |
| Pseudocode | No | The paper describes mathematical concepts and derivations but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code for reproducing the results of this paper can be found at https://jaydu1.github.io/overparameterized-ensembling/equiv. |
| Open Datasets | Yes | We conduct experiments on real-world datasets to examine the equivalence in a more general setting. We utilized three image datasets for our experimental analysis: CIFAR 10, MNIST, and USPS [59]. |
| Dataset Splits | Yes | The training sample sizes, the feature dimensions, and the test sample sizes (n, p, nte) are (10000, 3072, 2000), (12873, 784, 2145), and (22358, 3072, 7981) for the three datasets, respectively. |
| Hardware Specification | No | This work used the Bridges2 system at the Pittsburgh Supercomputing Center (PSC) through allocations MTH230020 from the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program. |
| Software Dependencies | Yes | using the default parameters as in Python package scikit-learn v1.2.2 [60] |
| Experiment Setup | Yes | For the simulations, we set ρar1 = 0.5. For finite ensembles, the risks are averaged across 50 simulations. For the CIFAR 10 dataset, we subset the images labeled as dog and cat . For other datasets, we subset the images labeled 3 and 8 . Then we treat them as binary labels y {0, 1} and use the flattened image as our feature vector x. The prediction risk of the ridge ensemble (M = 100) is computed based on the random feature φ(F xi) and the response yi, where F Rd p is the random weight matrix with Fij iid N(0, p 1). Here, φ is a nonlinear activation function (sigmoid, Re LU, or tanh). For the experiment, we set p = 250, d = 500, and ϕ = d/n = 0.1. |