Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Double or Nothing: Multiplicative Incentive Mechanisms for Crowdsourcing

Authors: Nihar B. Shah, Dengyong Zhou

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present synthetic simulations and real-world experiments to evaluate the effects of our setting and our mechanism on the final label quality. We conducted preliminary experiments on the Amazon Mechanical Turk commercial crowdsourcing platform (mturk.com) to evaluate our proposed scheme in real-world scenarios.
Researcher Affiliation Collaboration Nihar B. Shah EMAIL Department of Electrical Engineering and Computer Sciences University of California, Berkeley Berkeley, CA 94720 USA Dengyong Zhou EMAIL Machine Learning Department Microsoft Research One Microsoft Way, Redmond 98052 USA
Pseudocode Yes Algorithm 1: Incentive mechanism for skip-based setting Algorithm 2: Incentive mechanism for the confidence-based setting
Open Source Code No The paper does not provide concrete access to source code for the methodology described. It states: "The complete data, including the interface presented to the workers in each of the tasks, the results obtained from the workers, and the ground truth solutions, are available on the website of the first author.", which refers to data, not code.
Open Datasets Yes The complete data, including the interface presented to the workers in each of the tasks, the results obtained from the workers, and the ground truth solutions, are available on the website of the first author. This task required workers to identify the breeds of dogs shown in 85 images (source of images: Khosla et al. (2011); Deng et al. (2009)). This task required the workers to identify the textures shown in 24 grayscale images (source of images: Lazebnik et al. (2005, Data set 1: Textured surfaces)).
Dataset Splits No The paper describes how gold standard questions are distributed randomly among N questions but does not provide specific training/test/validation splits for model training or evaluation. For example: "The G gold standard questions are assumed to be distributed uniformly at random in the pool of N questions (of course, the worker does not know which G of the N questions form the gold standard)."
Hardware Specification No The paper mentions experiments conducted on the "Amazon Mechanical Turk commercial crowdsourcing platform (mturk.com)" but does not specify any particular hardware (e.g., GPU/CPU models, memory) used for running experiments or simulations.
Software Dependencies No The paper does not provide specific software dependencies or versions used for implementation or experimentation (e.g., Python, PyTorch, or other libraries with version numbers).
Experiment Setup Yes In this set of simulations, we set T = 0.75. We compared (a) the baseline mechanism with 5 cents for each correct answer in the gold standard, (b) the skip-based mechanism with κ = 5.9 and 1/T = 1.5, and (c) the confidence-based mechanism with κ = 5.9 cents, L = 2, α2 = 1.5, α1 = 1.4, α0 = 1, α-1 = 0.5, α-2 = 0.