reproducibilityindex.ai

Crowdsourcing Complex Workflows under Budget Constraints

Authors: Long Tran-Thanh, Trung Dong Huynh, Avi Rosenfeld, Sarvapali Ramchurn, Nicholas Jennings

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically evaluate it on a well-known crowdsourcing-based text correction workﬂow using Amazon Mechanical Turk, and show that Budgeteer can achieve similar levels of accuracy to current benchmarks, but is on average 45% cheaper.
Researcher Affiliation	Academia	Long Tran-Thanh University of Southampton, UK ltt08r@ecs.soton.ac.uk Trung Dong Huynh University of Southampton, UK tdh@ecs.soton.ac.uk Avi Rosenfeld Jerusalem College of Technology, Israel rosenfa@jct.ac.il Sarvapali D. Ramchurn University of Southampton, UK sdr@ecs.soton.ac.uk Nicholas R. Jennings University of Southampton, UK nrj@ecs.soton.ac.uk
Pseudocode	No	The paper describes the algorithm steps in narrative paragraphs rather than structured pseudocode or an algorithm block.
Open Source Code	No	The paper does not provide any link to source code or explicitly state that its code for the methodology is open source.
Open Datasets	Yes	We create a dataset of a total of 100 sentences.4 This dataset is available at http://bit.ly/1s Tya7F.
Dataset Splits	No	The paper mentions simulating algorithms and randomly picking sets of responses, but it does not specify explicit train/validation/test dataset splits, percentages, or sample counts.
Hardware Specification	No	The paper mentions using Amazon Mechanical Turk as a platform but does not provide specific hardware details (e.g., CPU, GPU models, or cloud instance types) used for running experiments.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., programming languages, libraries, or solvers with their versions) that would be needed to replicate the experiment.
Experiment Setup	Yes	As per the original Soylent experiments, we use Amazon Mechanical Turk (AMT 2010) and workers are paid the same amount, i.e. $0.06 per Find task, $0.08 for Fix tasks, and $0.04 for Verify tasks. Within Soylent, regardless of the sentence difﬁculty or budget, a minimum of 10 Find, 5 Fix, and 5 Verify tasks are generated per sentence (as per (Bernstein et al. 2010)). In contrast, both Budget Fix and Budgeteer use variable numbers of Finds, Fixes and Veriﬁes as per their algorithms. [...] Budget Fix requires three parameters to be tuned: Kmax, Lmax, and ε (while Kmax and Lmax in Budget Fix have a similar purpose in Budgeteer, ε is used to control the accuracy of estimation in the Find phase). As Figure 1 shows, the performance of Budget Fix can vary signiﬁcantly if these parameters are poorly set. Moreover, note that the Budget Fix performance in Table 2 is based on its (manually-set) optimal parameter settings (i.e., Kmax = Lmax = 2 and ε = 0.1).