Double or Nothing: Multiplicative Incentive Mechanisms for Crowdsourcing

Authors: Nihar Bhadresh Shah, Dengyong Zhou

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In preliminary experiments involving over several hundred workers, we observe a significant reduction in the error rates under our unique mechanism for the same or lower monetary expenditure. We conducted preliminary experiments on the Amazon Mechanical Turk commercial crowdsourcing platform (mturk.com) to evaluate our proposed scheme in real-world scenarios.
Researcher Affiliation Collaboration Nihar B. Shah University of California, Berkeley nihar@eecs.berkeley.edu; Dengyong Zhou Microsoft Research dengyong.zhou@microsoft.com
Pseudocode Yes Algorithm 1 Multiplicative incentive-compatible mechanism
Open Source Code No The paper states: "The complete data, including the interface presented to the workers in each of the tasks, the results obtained from the workers, and the ground truth solutions, are available on the website of the first author." This statement refers to data, not the source code for the methodology.
Open Datasets Yes The complete data, including the interface presented to the workers in each of the tasks, the results obtained from the workers, and the ground truth solutions, are available on the website of the first author.
Dataset Splits No The paper describes how worker responses were aggregated ("subsampled 3, 5, 7, 9 and 11 workers, and took a majority vote of their responses"), but it does not specify typical train/validation/test dataset splits used for training a machine learning model, as the paper focuses on data collection mechanisms.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup Yes We conducted the five following experiments (tasks) on Amazon Mechanical Turk: (a) identifying the golden gate bridge from pictures, (b) identifying the breeds of dogs from pictures, (c) identifying heads of countries, (d) identifying continents to which flags belong, and (e) identifying the textures in displayed images. Each of these tasks comprised 20 to 126 multiple choice questions. For each experiment, we compared (i) a baseline setting (Figure 1a) with an additive payment mechanism that pays a fixed amount per correct answer, and (ii) our skip-based setting (Figure 1b) with the multiplicative mechanism of Algorithm 1. For each experiment, and for each of the two settings, we had 35 workers independently perform the task. Upon completion of the tasks on Amazon Mechanical Turk, we aggregated the data in the following manner. For each mechanism in each experiment, we subsampled 3, 5, 7, 9 and 11 workers, and took a majority vote of their responses. We averaged the accuracy across all questions and across 1, 000 iterations of this subsample-and-aggregate procedure.