Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Position: Insights from Survey Methodology can Improve Training Data

Authors: Stephanie Eckman, Barbara Plank, Frauke Kreuter

ICML 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Splitting the collection across two screens (Condition B) led to higher rates of hate speech and offensive language annotation. Models trained on Condition B data also performed better than those trained on Condition A data across several metrics (Kern et al., 2023). This result is a clear example of how findings in the survey literature translate to the labeling task and improve the quality of training data.
Researcher Affiliation	Academia	Stephanie Eckman 1 Barbara Plank 2 3 4 Frauke Kreuter 5 4 1 6 1Social Data Science Center, University of Maryland, College Park, MD, USA 2Center for Information and Language Processing (CIS), LMU Munich, Germany 3Computer Science Department, IT University of Copenhagen, Denmark 4Munich Center for Machine Learning (MCML), LMU Munich, Germany 5Institute for Statistics, LMU Munich, Germany 6Joint Program in Survey Methodology, University of Maryland, College Park, MD, USA.
Pseudocode	No	No pseudocode or algorithm blocks are present in the paper. The methodology is described narratively.
Open Source Code	No	The paper does not contain an explicit statement or link providing access to open-source code for the methodology described in this paper.
Open Datasets	No	This paper is a position paper that reviews literature and discusses theoretical aspects; it does not present new experimental results that require the use of a training dataset, nor does it provide access information for any dataset.
Dataset Splits	No	This paper is a position paper that discusses theoretical concepts and insights from survey methodology; it does not conduct experiments requiring training/test/validation dataset splits to be reproduced.
Hardware Specification	No	The paper is a position paper and does not describe new experiments conducted by the authors, therefore, no hardware specifications for running experiments are provided.
Software Dependencies	No	The paper is a position paper and does not describe new experiments conducted by the authors, therefore, no specific software dependencies with version numbers are provided.
Experiment Setup	No	The paper is a position paper that discusses theoretical concepts and insights; it does not conduct new experiments or present specific experimental setup details such as hyperparameters or training configurations.