Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

State of the Art: Reproducibility in Artificial Intelligence

Authors: Odd Erik Gundersen, Sigbjørn Kjensmo

AAAI 2018 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Objective: To quantify the state of reproducibility of empirical AI research using six reproducibility metrics measuring three different degrees of reproducibility. ... A total of 400 research papers from the conference series IJCAI and AAAI have been surveyed using the metrics.
Researcher Affiliation	Academia	Odd Erik Gundersen, Sigbjørn Kjensmo Department of Computer Science Norwegian University of Science and Technology
Pseudocode	No	The paper describes concepts and metrics using natural language and mathematical formulas, but does not provide pseudocode or algorithm blocks for its own methodology.
Open Source Code	Yes	All the data and the code that has been used to calculate the reproducibility scores and generate the ﬁgures can be found on Github1. (Footnote 1: https://github.com/aaai2018-paperid-62/aaai2018-paperid-62)
Open Datasets	No	This paper conducts a survey and analysis of other research papers; it does not train machine learning models, and therefore the concept of a 'training dataset' with access information or splits, as typically understood in machine learning contexts, does not apply to its own experimental methodology.
Dataset Splits	No	This paper conducts a survey and analysis of other research papers; it does not train machine learning models, and therefore the concept of 'validation dataset splits', as typically understood in machine learning contexts, does not apply to its own experimental methodology.
Hardware Specification	No	The paper describes its methodology as a survey and analysis of other papers. It does not provide any specific hardware specifications (e.g., GPU/CPU models, memory) used for conducting its own analysis or calculations.
Software Dependencies	No	The paper states that its code is available on GitHub but does not explicitly list any software dependencies with specific version numbers (e.g., Python, specific libraries) within the paper's text.
Experiment Setup	No	The paper details its survey methodology (e.g., number of papers, variables collected) but does not include explicit experimental setup details such as hyperparameters, training configurations, or system-level settings, as these are not relevant to its survey-based research method.