Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
How does overparametrization affect performance on minority groups?
Authors: Saptarshi Roy, Subha Maity, Songkai Xue, Mikhail Yurochkin, Yuekai Sun
TMLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we complement these empirical studies with a theoretical investigation of the risk of overparameterized random feature regression models on minority groups... In an experiment with California housing prices dataset 2 (see Appendix D)... we also provide a simulation study for the effect of overparameterization on for classifications with random features. |
| Researcher Affiliation | Collaboration | Saptarshi Roy EMAIL Department of Computer Science University of Texas at Austin; Subha Maity EMAIL Department of Statistics & Actuarial Science University of Waterloo; Songkai Xue EMAIL Department of Statistics University of Michigan, Ann Arbor; Mikhail Yurochkin EMAIL MIT-IBM Watson AI Lab; Yuekai Sun EMAIL Department of Statistics University of Michigan, Ann Arbor |
| Pseudocode | No | The paper does not contain any explicitly labeled pseudocode or algorithm blocks. It primarily presents mathematical derivations and theoretical analyses. |
| Open Source Code | Yes | 1Codes are available at https://github.com/smaityumich/overparameterized-group-fairness. |
| Open Datasets | Yes | In an experiment with California housing prices dataset 2 (see Appendix D)... 2https://www.kaggle.com/datasets/camnugent/california-housing-prices |
| Dataset Splits | Yes | Furthermore, we split the data into training (80%) and test (20%) datasets. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running its experiments or simulations. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | In the simulation for Figure 8 we let σ( ) be the Re LU activation function and θi,j be IID standard normal distributed. Moreover, we let π = 0.95, β0 = 10e1, β1 = 10 cos(θ)e1 + 10 sin(θ)e2, n = 400, d = 200, N = γn where e1 and e2 are the first two standard basis of Rd. We tune hyperparameters θ {0 , 45 , 90 , 135 , 180 } and γ {0.5, 1, . . . , 3}, then report test errors averaged over 20 replicates. |