Crowdsourcing Complex Workflows under Budget Constraints

Authors: Long Tran-Thanh, Trung Dong Huynh, Avi Rosenfeld, Sarvapali Ramchurn, Nicholas Jennings

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically evaluate it on a well-known crowdsourcing-based text correction workflow using Amazon Mechanical Turk, and show that Budgeteer can achieve similar levels of accuracy to current benchmarks, but is on average 45% cheaper.
Researcher Affiliation Academia Long Tran-Thanh University of Southampton, UK ltt08r@ecs.soton.ac.uk Trung Dong Huynh University of Southampton, UK tdh@ecs.soton.ac.uk Avi Rosenfeld Jerusalem College of Technology, Israel rosenfa@jct.ac.il Sarvapali D. Ramchurn University of Southampton, UK sdr@ecs.soton.ac.uk Nicholas R. Jennings University of Southampton, UK nrj@ecs.soton.ac.uk
Pseudocode No The paper describes the algorithm steps in narrative paragraphs rather than structured pseudocode or an algorithm block.
Open Source Code No The paper does not provide any link to source code or explicitly state that its code for the methodology is open source.
Open Datasets Yes We create a dataset of a total of 100 sentences.4 This dataset is available at http://bit.ly/1s Tya7F.
Dataset Splits No The paper mentions simulating algorithms and randomly picking sets of responses, but it does not specify explicit train/validation/test dataset splits, percentages, or sample counts.
Hardware Specification No The paper mentions using Amazon Mechanical Turk as a platform but does not provide specific hardware details (e.g., CPU, GPU models, or cloud instance types) used for running experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., programming languages, libraries, or solvers with their versions) that would be needed to replicate the experiment.
Experiment Setup Yes As per the original Soylent experiments, we use Amazon Mechanical Turk (AMT 2010) and workers are paid the same amount, i.e. $0.06 per Find task, $0.08 for Fix tasks, and $0.04 for Verify tasks. Within Soylent, regardless of the sentence difficulty or budget, a minimum of 10 Find, 5 Fix, and 5 Verify tasks are generated per sentence (as per (Bernstein et al. 2010)). In contrast, both Budget Fix and Budgeteer use variable numbers of Finds, Fixes and Verifies as per their algorithms. [...] Budget Fix requires three parameters to be tuned: Kmax, Lmax, and ε (while Kmax and Lmax in Budget Fix have a similar purpose in Budgeteer, ε is used to control the accuracy of estimation in the Find phase). As Figure 1 shows, the performance of Budget Fix can vary significantly if these parameters are poorly set. Moreover, note that the Budget Fix performance in Table 2 is based on its (manually-set) optimal parameter settings (i.e., Kmax = Lmax = 2 and ε = 0.1).