Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Visualizing the Implicit Model Selection Tradeoff

Authors: Zezhen He, Yaron Shaposhnik

JAIR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Using various datasets and a simple Python interface, we demonstrate how practitioners and researchers could benefit from applying these approaches to better understand the broader impact of their model selection choices. We demonstrate how these methods can be used for various datasets from the UCI ML Repository. We next describe the datasets, training process, classification models, hyperparameters, and the DR methods used in our experiments.
Researcher Affiliation	Academia	Zezhen (Dawn) He EMAIL Simon Business School, University of Rochester Rochester, NY 14627. Yaron Shaposhnik EMAIL Simon Business School, University of Rochester Rochester, NY 14627.
Pseudocode	No	The paper describes methods in narrative and mathematical forms, but does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	We make the code available online to facilitate exploration and adoption (see Appendix E). Appendix E. Python Programming Interface... The code is available at https://github.com/zhesimon/Comparative Meta Models.
Open Datasets	Yes	We use datasets from the UCI Machine Learning Repository (Asuncion & Newman, 2007) and FICO’s Explainable Machine Learning Challenge (FICO, 2018). The description of the specific datasets used in this paper can be found in Appendix B.
Dataset Splits	Yes	We apply a standard model training and evaluation process of randomly partitioning each of the datasets into a 80% training set and a 20% test set. We evaluate the training error using five-fold CV on the training set and evaluate the test error on the test set.
Hardware Specification	No	The paper does not explicitly describe the hardware used for running its experiments, such as specific GPU/CPU models or processor types.
Software Dependencies	No	The paper mentions using a 'Python interface' and references 'Scikit-learn: Machine learning in Python (Pedregosa et al., 2011)', but does not provide specific version numbers for Python, Scikit-learn, or any other key software dependencies.
Experiment Setup	Yes	We apply hypterparameter tuning using five-fold CV to determine the configuration with the best prediction accuracy for each model. We tune the typical hyperparameters of each model. The specific values used as hyperparameters for each model are described in Appendix C. Appendix C. Tuning Parameters for Section 4. Appendix D. Tuning Parameters for the Section 8 Case study.