Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Bagging Provides Assumption-free Stability

Authors: Jake A. Soloff, Rina Foygel Barber, Rebecca Willett

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results validate our ﬁndings, showing that bagging successfully stabilizes even highly unstable base algorithms. [...] In this section, we study the stability of subbagging in simulation experiments. We use scikitlearn (Pedregosa et al., 2011) for all base algorithms.
Researcher Affiliation	Academia	Jake A. Soloﬀ EMAIL Rina Foygel Barber EMAIL Department of Statistics University of Chicago 5747 S Ellis Ave Chicago, IL 60637, USA Rebecca Willett EMAIL Departments of Statistics and Computer Science University of Chicago 5735 S Ave Chicago, IL 60637, USA
Pseudocode	Yes	Algorithm 1 Generic Bagging input Base algorithm A; data set D with n training points; number of bags B ≥ 1; resampling distribution Qn... Algorithm 2 Derandomized Bagging input Base algorithm A; data set D with n training points; resampling distribution Qn... Algorithm 3 Adaptively Clipped Bagging input Base algorithm A; data set D with n training points; number of bags B ≥ 1; resampling distribution Qn; data-dependent range I( )
Open Source Code	Yes	Code to reproduce all experiments is available at https://github.com/jake-soloff/subbagging-experiments.
Open Datasets	No	We simulate from the following data generating process: Xi ∼ iid N(0, Id), Yi \| Xi ∼ ind Bernoulli 1 1 + exp −XiTθ with sample size n = 500 and dimension d = 200, and where θ = (.1, . . . , .1) ∈ Rd.
Dataset Splits	No	The paper describes generating data for simulations with specific sample sizes (e.g., n = 500) and how test points (Xn+1) are generated from the same distribution, but does not describe splitting a larger pre-existing dataset into training, validation, or test sets.
Hardware Specification	No	The paper does not explicitly mention any specific hardware (e.g., GPU, CPU models, or memory specifications) used for running the experiments.
Software Dependencies	No	We use scikitlearn (Pedregosa et al., 2011) for all base algorithms. We use sklearn.linear_model.Logistic Regression, ... sklearn.neural_network.MLPClassifier, ... sklearn.tree.Decision Tree Regressor.
Experiment Setup	Yes	In each setting, we apply the base algorithm A as well as subbagging e AB with m = n/2 samples in each bag, using B = 10000 bags. ... We use sklearn.linear_model.Logistic Regression, setting options penalty='l2', C=1e3/n and fit_intercept=False, leaving all other parameters at their default values. ... We use sklearn.neural_network.MLPClassifier, setting hidden_layer_sizes=(40,), solver="sgd", learning_rate_init=0.2, max_iter=8, and alpha=1e-4, leaving all other parameters at their default values. ... We apply sklearn.tree.Decision Tree Regressor to train the regression trees, setting max_depth=50 and leaving all other parameters at their default values.