Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Data-centric Machine Learning Research (DMLR) - 2025

Website:

Venue Year Papers
Reproducibility Score Reproducibility Score based on Gundersen et al. (2025)
Documentation Score Global mean is the average score over the seven reproducibility variables for empirical research papers.
% Empirical Percentage of papers that are empirical research vs theoretical research
% Industry Percentage of empirical research papers with at least one author from Industry
Website
DMLR 2025 13 0.76 4.55 84.62% 18.18%
Pseudocode
Open Source Code
Open Datasets
Dataset Splits
Hardware Specification
Software Dependencies
Experiment Setup
Challenge design roadmap ❌ ❌ βœ… βœ… βœ… ❌ βœ… 4
Chronicling Germany: An Annotated Historical Newspaper Dataset ❌ βœ… βœ… βœ… βœ… ❌ βœ… 5
Constructing Confidence Intervals for β€œthe” Generalization Error – a Comprehensive Benchmark Study βœ… βœ… βœ… βœ… βœ… βœ… βœ… 7
Data Acquisition: A New Frontier in Data-centric AI ❌ βœ… βœ… ❌ ❌ ❌ ❌ 2
Deep Learning for Accurate Diagnosis of Viral Infections through scRNA-seq Analysis: A Comprehensive Benchmark Study ❌ ❌ βœ… ❌ ❌ ❌ ❌ 1
FlowBench: A Large Scale Benchmark for Flow Simulation over Complex Geometries ❌ βœ… βœ… βœ… βœ… ❌ βœ… 5
MONSTER: Monash Scalable Time Series Evaluation Repository ❌ βœ… βœ… βœ… βœ… ❌ βœ… 5
SuperBench: A Super-Resolution Benchmark Dataset for Scientific Machine Learning ❌ βœ… βœ… βœ… βœ… ❌ βœ… 5
Synthetic Datasets for Machine Learning on Spatio-Temporal Graphs using PDEs ❌ βœ… βœ… βœ… βœ… βœ… βœ… 6
Text Quality-Based Pruning for Efficient Training of Language Models ❌ ❌ βœ… βœ… ❌ ❌ βœ… 3
The FIX Benchmark: Extracting Features Interpretable to eXperts ❌ βœ… βœ… βœ… βœ… ❌ ❌ 4
Towards impactful challenges: post-challenge paper, benchmarks and other dissemination actions ❌ ❌ ❌ ❌ ❌ ❌ ❌ 0
V-LoL: A Diagnostic Dataset for Visual Logical Learning βœ… βœ… βœ… βœ… βœ… βœ… βœ… 7