reproducibilityindex.ai

Generalized equivalences between subsampling and ridge regularization

Authors: Pratik Patil, Jin-Hong Du

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the practical implications of our findings on real-world datasets (see Section 6). Figure 1: Heat map of the various generalized risks (estimation risk, training error, prediction risk, out-of-distribution (OOD) prediction risk) of full-ensemble ridge estimators (approximated with M = 100), for varying ridge penalties λ and subsample aspect ratios ψ = p/k on the log-log scale. The data model is described in Appendix F.2 with p = 500, n = 5000, and ϕ = p/n = 0.1.
Researcher Affiliation	Academia	Pratik Patil Department of Statistics University of California Berkeley, CA 94720, USA pratikpatil@berkeley.edu Jin-Hong Du Department of Statistics and Data Science & Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213, USA jinhongd@andrew.cmu.edu
Pseudocode	No	The paper describes mathematical concepts and derivations but does not include any pseudocode or algorithm blocks.
Open Source Code	Yes	The code for reproducing the results of this paper can be found at https://jaydu1.github.io/overparameterized-ensembling/equiv.
Open Datasets	Yes	We conduct experiments on real-world datasets to examine the equivalence in a more general setting. We utilized three image datasets for our experimental analysis: CIFAR 10, MNIST, and USPS [59].
Dataset Splits	Yes	The training sample sizes, the feature dimensions, and the test sample sizes (n, p, nte) are (10000, 3072, 2000), (12873, 784, 2145), and (22358, 3072, 7981) for the three datasets, respectively.
Hardware Specification	No	This work used the Bridges2 system at the Pittsburgh Supercomputing Center (PSC) through allocations MTH230020 from the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program.
Software Dependencies	Yes	using the default parameters as in Python package scikit-learn v1.2.2 [60]
Experiment Setup	Yes	For the simulations, we set ρar1 = 0.5. For finite ensembles, the risks are averaged across 50 simulations. For the CIFAR 10 dataset, we subset the images labeled as dog and cat . For other datasets, we subset the images labeled 3 and 8 . Then we treat them as binary labels y {0, 1} and use the flattened image as our feature vector x. The prediction risk of the ridge ensemble (M = 100) is computed based on the random feature φ(F xi) and the response yi, where F Rd p is the random weight matrix with Fij iid N(0, p 1). Here, φ is a nonlinear activation function (sigmoid, Re LU, or tanh). For the experiment, we set p = 250, d = 500, and ϕ = d/n = 0.1.