reproducibilityindex.ai

Bayesian Aggregation of Categorical Distributions with Applications in Crowdsourcing

Authors: Alexandry Augustin, Matteo Venanzi, Alex Rogers, Nicholas R. Jennings

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show comparable aggregation accuracy when 60% of the workers are spammers, as other state of the art approaches do when there are no spammers. To evaluate the eﬃcacy of our model, we use an independently gathered dataset, and introduce two new datasets; all of which include ground truth from expert annotators.
Researcher Affiliation	Collaboration	Alexandry Augustin Southampton University Southampton, UK aa7e14@ecs.soton.ac.uk Matteo Venanzi Microsoft London, UK mavena@microsoft.com Alex Rogers Oxford University Oxford, UK alex.rogers@cs.ox.ac.uk Nicholas R. Jennings Imperial College London, UK n.jennings@imperial.ac.uk
Pseudocode	Yes	Algorithm 1 Generative process of MBCC.
Open Source Code	Yes	The source code and datasets are available at [Augustin and Venanzi, 2017].
Open Datasets	Yes	The source code and datasets are available at [Augustin and Venanzi, 2017]. Sem Eval. This dataset contains judgments of sentiments within one hundred news headlines sampled from the Sem Eval2007 test set [Strapparava and Mihalcea, 2007; Snow et al., 2008]. IAPR-TC12. We crowdsourced a set of 16 images sampled from the IAPR-TC12 dataset [Escalante et al., 2010; Augustin and Venanzi, 2017]. Colours. We crowdsourced a set of 460 judgments of the proportion of colours in the ﬂags of 20 countries [Augustin and Venanzi, 2017].
Dataset Splits	No	The paper mentions 'The experiments are run in an unsupervised setting, where the ground truth is never exposed to the algorithms, and is only used to measure their accuracy.' It does not provide specific percentages or counts for training, validation, or test splits.
Hardware Specification	No	The paper discusses running times (e.g., 'the running time of Lin Op and Median is typically 3ms, while that of IBCC and MBCC ranges from 12s and 13s, to 6min and 28min respectively'), but does not provide any specific hardware details such as GPU/CPU models, processor types, or memory amounts used for the experiments.
Software Dependencies	No	The paper mentions that 'our implementation uses the variational message passing algorithm [Winn and Bishop, 2005]' but does not provide specific version numbers for any software libraries, programming languages, or solvers used in the implementation.
Experiment Setup	Yes	We set the parameter of the prior probability of each confusion matrix for all workers and spammers to A(k) = 100 I + 1T 1. Finally, we run all models a hundred times each to achieve statistically signiﬁcant results at the 99% conﬁdence level. Figure 2 (left) shows the average error on the aggregated distributions Λ on the Sem Eval dataset when increasing: (left) the ratio of spammers at N = 180 samples, (right) the number of samples at a ratio of spammers of 50%. (left) the IAPR-TC12 dataset at N = 180 samples, (right) the Colours dataset at N = 330 samples.