reproducibilityindex.ai

Double or Nothing: Multiplicative Incentive Mechanisms for Crowdsourcing

Authors: Nihar Bhadresh Shah, Dengyong Zhou

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In preliminary experiments involving over several hundred workers, we observe a signiﬁcant reduction in the error rates under our unique mechanism for the same or lower monetary expenditure. We conducted preliminary experiments on the Amazon Mechanical Turk commercial crowdsourcing platform (mturk.com) to evaluate our proposed scheme in real-world scenarios.
Researcher Affiliation	Collaboration	Nihar B. Shah University of California, Berkeley nihar@eecs.berkeley.edu; Dengyong Zhou Microsoft Research dengyong.zhou@microsoft.com
Pseudocode	Yes	Algorithm 1 Multiplicative incentive-compatible mechanism
Open Source Code	No	The paper states: "The complete data, including the interface presented to the workers in each of the tasks, the results obtained from the workers, and the ground truth solutions, are available on the website of the ﬁrst author." This statement refers to data, not the source code for the methodology.
Open Datasets	Yes	The complete data, including the interface presented to the workers in each of the tasks, the results obtained from the workers, and the ground truth solutions, are available on the website of the ﬁrst author.
Dataset Splits	No	The paper describes how worker responses were aggregated ("subsampled 3, 5, 7, 9 and 11 workers, and took a majority vote of their responses"), but it does not specify typical train/validation/test dataset splits used for training a machine learning model, as the paper focuses on data collection mechanisms.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup	Yes	We conducted the ﬁve following experiments (tasks) on Amazon Mechanical Turk: (a) identifying the golden gate bridge from pictures, (b) identifying the breeds of dogs from pictures, (c) identifying heads of countries, (d) identifying continents to which ﬂags belong, and (e) identifying the textures in displayed images. Each of these tasks comprised 20 to 126 multiple choice questions. For each experiment, we compared (i) a baseline setting (Figure 1a) with an additive payment mechanism that pays a ﬁxed amount per correct answer, and (ii) our skip-based setting (Figure 1b) with the multiplicative mechanism of Algorithm 1. For each experiment, and for each of the two settings, we had 35 workers independently perform the task. Upon completion of the tasks on Amazon Mechanical Turk, we aggregated the data in the following manner. For each mechanism in each experiment, we subsampled 3, 5, 7, 9 and 11 workers, and took a majority vote of their responses. We averaged the accuracy across all questions and across 1, 000 iterations of this subsample-and-aggregate procedure.