Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Are Ensembles Getting Better All the Time?

Authors: Pierre-Alexandre Mattei, Damien Garreau

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We illustrate our results on a medical problem (diagnosing melanomas using neural nets) and a wisdom of crowds experiment (guessing the ratings of upcoming movies).
Researcher Affiliation	Academia	Pierre-Alexandre Mattei EMAIL Université Côte d Azur Inria, Maasai team Laboratoire J.A. Dieudonn e, CNRS Nice, France Damien Garreau EMAIL Julius-Maximilians-Universit at W ürzburg Institute for Computer Science / CAIDAS W ürzburg, Germany
Pseudocode	No	The paper presents theoretical results, theorems, and proofs but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	In Appendix G, we show similar curves for three more movies. It is also possible to produce such curves for all 20 movies using a Python notebook available at https://github.com/pamattei/Getting-Better-Ensembles.
Open Datasets	Yes	We use the Derma MNIST (Yang et al., 2023) data set, based on the HAM10000 collection (Tschandl et al., 2018)... based on data collected by Simoiu et al. (2019).
Dataset Splits	Yes	The training/validation/test split is the same as the one from Yang et al. (2023), and consists of 1, 548 color images of resolution 28 28 for training, 221 similar images for validation, and 443 test images.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU model, CPU type) used for the experiments.
Software Dependencies	No	The paper mentions using a 'Le Net-like convolutional neural network' and 'dropout' but does not specify software versions for libraries, frameworks, or programming languages.
Experiment Setup	Yes	We use a simple Le Net-like convolutional network (Le Cun et al., 1998) whose fully connected layers are regularised with a dropout rate of 50% (Srivastava et al., 2014).