Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

When majority rules, minority loses: bias amplification of gradient descent

Authors: François Bachoc, Jerome Bolte, Ryan Boustany, Jean-Michel Loubes

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our results are illustrated through experiments in deep learning for tabular and image classification tasks. We illustrate our theoretical findings through numerical experiments on tabular and image-classification tasks with deep neural networks (Section 4).
Researcher Affiliation	Academia	François Bachoc University of Lille Institut Universitaire de France (IUF) EMAIL Jérôme Bolte Toulouse School of Economics ANITI EMAIL Ryan Boustany Toulouse School of Economics EMAIL Jean-Michel Loubes Université de Toulouse ANITI & Regalia INRIA EMAIL
Pseudocode	No	The paper describes methods and theoretical findings using mathematical formulations and descriptive text, but it does not include any explicitly labeled pseudocode or algorithm blocks. Algorithm names like 'SGD' or 'XGBoost' are mentioned, but their procedures are not outlined in a structured pseudocode format within the paper.
Open Source Code	Yes	All code and configuration files (including seed control, training logs, and plotting scripts) are available at https://github.com/ryanboustany/bias_amplification. We follow best practices for reproducible research and ensure all experimental figures can be regenerated with a single command.
Open Datasets	Yes	We study the effect of subgroup imbalance in supervised deep learning using image (CIFAR-10 [28], Euro SAT [21]) and tabular (Adult [5]) datasets.
Dataset Splits	Yes	The original dataset has 10 classes with 5000 samples each. To create imbalance, we subsample one class (denoted A = 0) to retain n0 samples, and keep the others (A = 1) unchanged with n1 = 9 5000. As in [26], we define the imbalance ratio as = n0/5000, which gives a group proportion n0/(n0 + n1) = /( + 9), and consider four imbalance levels: {1%, 10%, 30%, 80%}. In Figure 3, we show the results for Res Net-18 (see also Appendix D for more). For = 1%, Acc0 remains close to zero for about 60 epochs, following a stereotypical training curve (see Appendix).
Hardware Specification	Yes	Experiments were conducted on a computing cluster equipped with NVIDIA A100 40GB GPUs. Each experiment ran on a single GPU unless otherwise specified.
Software Dependencies	No	The paper mentions software components like SGD, Adam W, and XGBoost, and refers to 'PyTorch models' in the NeurIPS checklist. However, it does not provide specific version numbers for these or any other software dependencies within the text of the paper.
Experiment Setup	Yes	We use SGD with a constant learning rate for image models and tabular data. In order to match our theoretical setting, no weight decay or learning rate decay schedule was applied. Models were trained from scratch without pretraining. Refer to Table 6 for more details. Table 6: Optimization hyperparameters for each task. Dataset Model Optimizer Learning rate CIFAR-10 Res Net-18 SGD 1e-2 CIFAR-10 VGG19 SGD 1e-2 Euro SAT Res Net-18 SGD 1e-4 Adult Tab Net SGD 2e-2