Double or Nothing: Multiplicative Incentive Mechanisms for Crowdsourcing
Authors: Nihar Bhadresh Shah, Dengyong Zhou
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In preliminary experiments involving over several hundred workers, we observe a significant reduction in the error rates under our unique mechanism for the same or lower monetary expenditure. We conducted preliminary experiments on the Amazon Mechanical Turk commercial crowdsourcing platform (mturk.com) to evaluate our proposed scheme in real-world scenarios. |
| Researcher Affiliation | Collaboration | Nihar B. Shah University of California, Berkeley nihar@eecs.berkeley.edu; Dengyong Zhou Microsoft Research dengyong.zhou@microsoft.com |
| Pseudocode | Yes | Algorithm 1 Multiplicative incentive-compatible mechanism |
| Open Source Code | No | The paper states: "The complete data, including the interface presented to the workers in each of the tasks, the results obtained from the workers, and the ground truth solutions, are available on the website of the first author." This statement refers to data, not the source code for the methodology. |
| Open Datasets | Yes | The complete data, including the interface presented to the workers in each of the tasks, the results obtained from the workers, and the ground truth solutions, are available on the website of the first author. |
| Dataset Splits | No | The paper describes how worker responses were aggregated ("subsampled 3, 5, 7, 9 and 11 workers, and took a majority vote of their responses"), but it does not specify typical train/validation/test dataset splits used for training a machine learning model, as the paper focuses on data collection mechanisms. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment. |
| Experiment Setup | Yes | We conducted the five following experiments (tasks) on Amazon Mechanical Turk: (a) identifying the golden gate bridge from pictures, (b) identifying the breeds of dogs from pictures, (c) identifying heads of countries, (d) identifying continents to which flags belong, and (e) identifying the textures in displayed images. Each of these tasks comprised 20 to 126 multiple choice questions. For each experiment, we compared (i) a baseline setting (Figure 1a) with an additive payment mechanism that pays a fixed amount per correct answer, and (ii) our skip-based setting (Figure 1b) with the multiplicative mechanism of Algorithm 1. For each experiment, and for each of the two settings, we had 35 workers independently perform the task. Upon completion of the tasks on Amazon Mechanical Turk, we aggregated the data in the following manner. For each mechanism in each experiment, we subsampled 3, 5, 7, 9 and 11 workers, and took a majority vote of their responses. We averaged the accuracy across all questions and across 1, 000 iterations of this subsample-and-aggregate procedure. |