Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Double or Nothing: Multiplicative Incentive Mechanisms for Crowdsourcing
Authors: Nihar B. Shah, Dengyong Zhou
JMLR 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present synthetic simulations and real-world experiments to evaluate the effects of our setting and our mechanism on the final label quality. We conducted preliminary experiments on the Amazon Mechanical Turk commercial crowdsourcing platform (mturk.com) to evaluate our proposed scheme in real-world scenarios. |
| Researcher Affiliation | Collaboration | Nihar B. Shah EMAIL Department of Electrical Engineering and Computer Sciences University of California, Berkeley Berkeley, CA 94720 USA Dengyong Zhou EMAIL Machine Learning Department Microsoft Research One Microsoft Way, Redmond 98052 USA |
| Pseudocode | Yes | Algorithm 1: Incentive mechanism for skip-based setting Algorithm 2: Incentive mechanism for the confidence-based setting |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. It states: "The complete data, including the interface presented to the workers in each of the tasks, the results obtained from the workers, and the ground truth solutions, are available on the website of the first author.", which refers to data, not code. |
| Open Datasets | Yes | The complete data, including the interface presented to the workers in each of the tasks, the results obtained from the workers, and the ground truth solutions, are available on the website of the first author. This task required workers to identify the breeds of dogs shown in 85 images (source of images: Khosla et al. (2011); Deng et al. (2009)). This task required the workers to identify the textures shown in 24 grayscale images (source of images: Lazebnik et al. (2005, Data set 1: Textured surfaces)). |
| Dataset Splits | No | The paper describes how gold standard questions are distributed randomly among N questions but does not provide specific training/test/validation splits for model training or evaluation. For example: "The G gold standard questions are assumed to be distributed uniformly at random in the pool of N questions (of course, the worker does not know which G of the N questions form the gold standard)." |
| Hardware Specification | No | The paper mentions experiments conducted on the "Amazon Mechanical Turk commercial crowdsourcing platform (mturk.com)" but does not specify any particular hardware (e.g., GPU/CPU models, memory) used for running experiments or simulations. |
| Software Dependencies | No | The paper does not provide specific software dependencies or versions used for implementation or experimentation (e.g., Python, PyTorch, or other libraries with version numbers). |
| Experiment Setup | Yes | In this set of simulations, we set T = 0.75. We compared (a) the baseline mechanism with 5 cents for each correct answer in the gold standard, (b) the skip-based mechanism with κ = 5.9 and 1/T = 1.5, and (c) the confidence-based mechanism with κ = 5.9 cents, L = 2, α2 = 1.5, α1 = 1.4, α0 = 1, α-1 = 0.5, α-2 = 0. |