Bayesian Aggregation of Categorical Distributions with Applications in Crowdsourcing

Authors: Alexandry Augustin, Matteo Venanzi, Alex Rogers, Nicholas R. Jennings

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show comparable aggregation accuracy when 60% of the workers are spammers, as other state of the art approaches do when there are no spammers. To evaluate the efficacy of our model, we use an independently gathered dataset, and introduce two new datasets; all of which include ground truth from expert annotators.
Researcher Affiliation Collaboration Alexandry Augustin Southampton University Southampton, UK aa7e14@ecs.soton.ac.uk Matteo Venanzi Microsoft London, UK mavena@microsoft.com Alex Rogers Oxford University Oxford, UK alex.rogers@cs.ox.ac.uk Nicholas R. Jennings Imperial College London, UK n.jennings@imperial.ac.uk
Pseudocode Yes Algorithm 1 Generative process of MBCC.
Open Source Code Yes The source code and datasets are available at [Augustin and Venanzi, 2017].
Open Datasets Yes The source code and datasets are available at [Augustin and Venanzi, 2017]. Sem Eval. This dataset contains judgments of sentiments within one hundred news headlines sampled from the Sem Eval2007 test set [Strapparava and Mihalcea, 2007; Snow et al., 2008]. IAPR-TC12. We crowdsourced a set of 16 images sampled from the IAPR-TC12 dataset [Escalante et al., 2010; Augustin and Venanzi, 2017]. Colours. We crowdsourced a set of 460 judgments of the proportion of colours in the flags of 20 countries [Augustin and Venanzi, 2017].
Dataset Splits No The paper mentions 'The experiments are run in an unsupervised setting, where the ground truth is never exposed to the algorithms, and is only used to measure their accuracy.' It does not provide specific percentages or counts for training, validation, or test splits.
Hardware Specification No The paper discusses running times (e.g., 'the running time of Lin Op and Median is typically 3ms, while that of IBCC and MBCC ranges from 12s and 13s, to 6min and 28min respectively'), but does not provide any specific hardware details such as GPU/CPU models, processor types, or memory amounts used for the experiments.
Software Dependencies No The paper mentions that 'our implementation uses the variational message passing algorithm [Winn and Bishop, 2005]' but does not provide specific version numbers for any software libraries, programming languages, or solvers used in the implementation.
Experiment Setup Yes We set the parameter of the prior probability of each confusion matrix for all workers and spammers to A(k) = 100 I + 1T 1. Finally, we run all models a hundred times each to achieve statistically significant results at the 99% confidence level. Figure 2 (left) shows the average error on the aggregated distributions Λ on the Sem Eval dataset when increasing: (left) the ratio of spammers at N = 180 samples, (right) the number of samples at a ratio of spammers of 50%. (left) the IAPR-TC12 dataset at N = 180 samples, (right) the Colours dataset at N = 330 samples.