Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Identifying Unreliable and Adversarial Workers in Crowdsourced Labeling Tasks

Authors: Srikanth Jagabathula, Lakshminarayanan Subramanian, Ashwin Venkataraman

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our work makes algorithmic, theoretical, and empirical contributions: Theoretically, we show that our algorithms successfully identify unreliable honest workers, workers adopting deterministic strategies, and worst-case sophisticated adversaries... Empirically, we show that filtering out outliers using our algorithms can significantly improve the accuracy of several state-of-the-art label aggregation algorithms in real-world crowdsourcing datasets. We conducted two numerical studies to demonstrate the practical value of our methods.
Researcher Affiliation	Academia	Srikanth Jagabathula EMAIL Department of Information, Operations, and Management Sciences Leonard N. Stern School of Business New York University, NY 10012, USA. Lakshminarayanan Subramanian EMAIL Department of Computer Science Courant Institute of Mathematical Sciences New York University, NY 10012, USA. Ashwin Venkataraman EMAIL Department of Computer Science Courant Institute of Mathematical Sciences New York University, NY 10012, USA
Pseudocode	Yes	Algorithm 1 soft penalty, Algorithm 2 hard penalty, Algorithm 3 penalty-based label aggregation
Open Source Code	No	The paper does not provide an explicit statement about the release of its source code or a link to a code repository. It mentions using a third-party library, 'Python networkx library', but this is not the authors' own implementation code.
Open Datasets	Yes	We focused on the following standard datasets: stage2 and task2: consisting of a collection of topic-document pairs labeled as relevant or non-relevant by workers on Amazon Mechanical Turk (see Tang and Lease, 2011). rte and temp: consisting of annotations by Amazon Mechanical Turk workers for different natural language processing (NLP) tasks... (see Snow et al., 2008). tweets: consisting of sentiment (positive or negative) labels for 1000 tweets (see Mozafari et al., 2014).
Dataset Splits	No	The paper describes the generation of synthetic data in Section 5.2, including parameters for worker honesty probability (q=0.7), task prevalence (γ=0.5), and worker reliability distribution (µw u.a.r from [0.8, 1.0)). However, it does not explicitly provide details about training/test/validation splits for either the synthetic data or the real-world datasets used.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running its experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper mentions using the 'Python networkx library' in Section 5.2 but does not provide a specific version number for this library or for Python itself. The instructions require specific version numbers for software dependencies.
Experiment Setup	Yes	Section 5.2, 'Setup of study', details parameters for the simulation, such as 'n = 100 workers', 'probability q that a worker is honest was set to 0.7', 'prevalence γ of +1 tasks was set to 0.5', 'worker degrees according to a power-law distribution (with exponent a = 2.5) with the minimum degree equal to 5', and 'reliability µw u.a.r from the interval [0.8, 1.0)'. Additionally, Section 5.1 states 'We chose kmax = 100 in our experiments' for the KOS algorithm and describes an iterative worker removal process.