Crowdsourcing Complex Workflows under Budget Constraints
Authors: Long Tran-Thanh, Trung Dong Huynh, Avi Rosenfeld, Sarvapali Ramchurn, Nicholas Jennings
AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically evaluate it on a well-known crowdsourcing-based text correction workflow using Amazon Mechanical Turk, and show that Budgeteer can achieve similar levels of accuracy to current benchmarks, but is on average 45% cheaper. |
| Researcher Affiliation | Academia | Long Tran-Thanh University of Southampton, UK ltt08r@ecs.soton.ac.uk Trung Dong Huynh University of Southampton, UK tdh@ecs.soton.ac.uk Avi Rosenfeld Jerusalem College of Technology, Israel rosenfa@jct.ac.il Sarvapali D. Ramchurn University of Southampton, UK sdr@ecs.soton.ac.uk Nicholas R. Jennings University of Southampton, UK nrj@ecs.soton.ac.uk |
| Pseudocode | No | The paper describes the algorithm steps in narrative paragraphs rather than structured pseudocode or an algorithm block. |
| Open Source Code | No | The paper does not provide any link to source code or explicitly state that its code for the methodology is open source. |
| Open Datasets | Yes | We create a dataset of a total of 100 sentences.4 This dataset is available at http://bit.ly/1s Tya7F. |
| Dataset Splits | No | The paper mentions simulating algorithms and randomly picking sets of responses, but it does not specify explicit train/validation/test dataset splits, percentages, or sample counts. |
| Hardware Specification | No | The paper mentions using Amazon Mechanical Turk as a platform but does not provide specific hardware details (e.g., CPU, GPU models, or cloud instance types) used for running experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., programming languages, libraries, or solvers with their versions) that would be needed to replicate the experiment. |
| Experiment Setup | Yes | As per the original Soylent experiments, we use Amazon Mechanical Turk (AMT 2010) and workers are paid the same amount, i.e. $0.06 per Find task, $0.08 for Fix tasks, and $0.04 for Verify tasks. Within Soylent, regardless of the sentence difficulty or budget, a minimum of 10 Find, 5 Fix, and 5 Verify tasks are generated per sentence (as per (Bernstein et al. 2010)). In contrast, both Budget Fix and Budgeteer use variable numbers of Finds, Fixes and Verifies as per their algorithms. [...] Budget Fix requires three parameters to be tuned: Kmax, Lmax, and ε (while Kmax and Lmax in Budget Fix have a similar purpose in Budgeteer, ε is used to control the accuracy of estimation in the Find phase). As Figure 1 shows, the performance of Budget Fix can vary significantly if these parameters are poorly set. Moreover, note that the Budget Fix performance in Table 2 is based on its (manually-set) optimal parameter settings (i.e., Kmax = Lmax = 2 and ε = 0.1). |