Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
State of the Art: Reproducibility in Artificial Intelligence
Authors: Odd Erik Gundersen, Sigbjørn Kjensmo
AAAI 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Objective: To quantify the state of reproducibility of empirical AI research using six reproducibility metrics measuring three different degrees of reproducibility. ... A total of 400 research papers from the conference series IJCAI and AAAI have been surveyed using the metrics. |
| Researcher Affiliation | Academia | Odd Erik Gundersen, Sigbjørn Kjensmo Department of Computer Science Norwegian University of Science and Technology |
| Pseudocode | No | The paper describes concepts and metrics using natural language and mathematical formulas, but does not provide pseudocode or algorithm blocks for its own methodology. |
| Open Source Code | Yes | All the data and the code that has been used to calculate the reproducibility scores and generate the figures can be found on Github1. (Footnote 1: https://github.com/aaai2018-paperid-62/aaai2018-paperid-62) |
| Open Datasets | No | This paper conducts a survey and analysis of other research papers; it does not train machine learning models, and therefore the concept of a 'training dataset' with access information or splits, as typically understood in machine learning contexts, does not apply to its own experimental methodology. |
| Dataset Splits | No | This paper conducts a survey and analysis of other research papers; it does not train machine learning models, and therefore the concept of 'validation dataset splits', as typically understood in machine learning contexts, does not apply to its own experimental methodology. |
| Hardware Specification | No | The paper describes its methodology as a survey and analysis of other papers. It does not provide any specific hardware specifications (e.g., GPU/CPU models, memory) used for conducting its own analysis or calculations. |
| Software Dependencies | No | The paper states that its code is available on GitHub but does not explicitly list any software dependencies with specific version numbers (e.g., Python, specific libraries) within the paper's text. |
| Experiment Setup | No | The paper details its survey methodology (e.g., number of papers, variables collected) but does not include explicit experimental setup details such as hyperparameters, training configurations, or system-level settings, as these are not relevant to its survey-based research method. |