Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Data-centric Machine Learning Research (DMLR) - 2025

Documentation Rate of Empirical Papers by Reproducibility Variable

Distribution of Empirical Papers by Number of Documented Variables

Website:

Venue Year Papers
Reproducibility Score Reproducibility Score based on Gundersen et al. (2025). See Methods for details.
Documentation Score Documentation Score is the average score over the seven reproducibility variables for empirical research papers. See Methods for details.
% Empirical Percentage of papers that are empirical research vs theoretical research.
% Industry Percentage of empirical research papers with at least one author from Industry.
Website
DMLR 2025 13 0.76 4.55 84.62% 18.18%
Pseudocode
Open Source Code
Open Datasets
Dataset Splits
Hardware Specification
Software Dependencies
Experiment Setup
Challenge design roadmap ❌ ❌ βœ… βœ… βœ… ❌ βœ… 4
Chronicling Germany: An Annotated Historical Newspaper Dataset ❌ βœ… βœ… βœ… βœ… ❌ βœ… 5
Constructing Confidence Intervals for β€œthe” Generalization Error – a Comprehensive Benchmark Study βœ… βœ… βœ… βœ… βœ… βœ… βœ… 7
Data Acquisition: A New Frontier in Data-centric AI ❌ βœ… βœ… ❌ ❌ ❌ ❌ 2
Deep Learning for Accurate Diagnosis of Viral Infections through scRNA-seq Analysis: A Comprehensive Benchmark Study ❌ ❌ βœ… ❌ ❌ ❌ ❌ 1
FlowBench: A Large Scale Benchmark for Flow Simulation over Complex Geometries ❌ βœ… βœ… βœ… βœ… ❌ βœ… 5
MONSTER: Monash Scalable Time Series Evaluation Repository ❌ βœ… βœ… βœ… βœ… ❌ βœ… 5
SuperBench: A Super-Resolution Benchmark Dataset for Scientific Machine Learning ❌ βœ… βœ… βœ… βœ… ❌ βœ… 5
Synthetic Datasets for Machine Learning on Spatio-Temporal Graphs using PDEs ❌ βœ… βœ… βœ… βœ… βœ… βœ… 6
Text Quality-Based Pruning for Efficient Training of Language Models ❌ ❌ βœ… βœ… ❌ ❌ βœ… 3
The FIX Benchmark: Extracting Features Interpretable to eXperts ❌ βœ… βœ… βœ… βœ… ❌ ❌ 4
Towards impactful challenges: post-challenge paper, benchmarks and other dissemination actions ❌ ❌ ❌ ❌ ❌ ❌ ❌ 0
V-LoL: A Diagnostic Dataset for Visual Logical Learning βœ… βœ… βœ… βœ… βœ… βœ… βœ… 7