Position: Insights from Survey Methodology can Improve Training Data

Authors: Stephanie Eckman, Barbara Plank, Frauke Kreuter

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Splitting the collection across two screens (Condition B) led to higher rates of hate speech and offensive language annotation. Models trained on Condition B data also performed better than those trained on Condition A data across several metrics (Kern et al., 2023). This result is a clear example of how findings in the survey literature translate to the labeling task and improve the quality of training data.
Researcher Affiliation Academia Stephanie Eckman 1 Barbara Plank 2 3 4 Frauke Kreuter 5 4 1 6 1Social Data Science Center, University of Maryland, College Park, MD, USA 2Center for Information and Language Processing (CIS), LMU Munich, Germany 3Computer Science Department, IT University of Copenhagen, Denmark 4Munich Center for Machine Learning (MCML), LMU Munich, Germany 5Institute for Statistics, LMU Munich, Germany 6Joint Program in Survey Methodology, University of Maryland, College Park, MD, USA.
Pseudocode No No pseudocode or algorithm blocks are present in the paper. The methodology is described narratively.
Open Source Code No The paper does not contain an explicit statement or link providing access to open-source code for the methodology described in this paper.
Open Datasets No This paper is a position paper that reviews literature and discusses theoretical aspects; it does not present new experimental results that require the use of a training dataset, nor does it provide access information for any dataset.
Dataset Splits No This paper is a position paper that discusses theoretical concepts and insights from survey methodology; it does not conduct experiments requiring training/test/validation dataset splits to be reproduced.
Hardware Specification No The paper is a position paper and does not describe new experiments conducted by the authors, therefore, no hardware specifications for running experiments are provided.
Software Dependencies No The paper is a position paper and does not describe new experiments conducted by the authors, therefore, no specific software dependencies with version numbers are provided.
Experiment Setup No The paper is a position paper that discusses theoretical concepts and insights; it does not conduct new experiments or present specific experimental setup details such as hyperparameters or training configurations.