Crowdsourcing via Pairwise Co-occurrences: Identifiability and Algorithms

Authors: Shahana Ibrahim, Xiao Fu, Nikolaos Kargas, Kejun Huang

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that the proposed algorithms outperform the state-of-art algorithms under a variety of scenarios.
Researcher Affiliation Academia Shahana Ibrahim School of Elect. Eng. & Computer Sci. Oregon State University Corvallis, OR 97331 ibrahish@oregonstate.edu Xiao Fu School of Elect. Eng. & Computer Sci. Oregon State University Corvallis, OR 97331 xiao.fu@oregonstate.edu Nikos Kargas Department of Elect. & Computer Eng. University of Minnesota Minneapolis, MN 55455 kaga005@umn.edu Kejun Huang Department of Computing & Info. Sci. & Eng. University of Florida Gainesville, FL 32611 kejun.huang@ufl.edu
Pseudocode Yes Algorithm 1 Multi SPA
Open Source Code No The paper states 'The algorithms are coded in Matlab.' but does not provide any link or explicit statement about releasing the source code.
Open Datasets Yes We employ different UCI datasets (https://archive.ics. uci.edu/ml/datasets.html; details in Sec. B).
Dataset Splits No The paper states 'we use 20% of the samples to act as training data' but does not specify a separate validation set or full training/validation/test splits by percentage or count.
Hardware Specification No The paper does not specify any particular hardware (GPU, CPU models, etc.) used for running the experiments, only mentioning that the algorithms are coded in Matlab.
Software Dependencies No The paper mentions 'Matlab' and 'MATLAB machine learning toolbox' but does not provide specific version numbers for these software components.
Experiment Setup Yes In order to train the annotators, we use 20% of the samples to act as training data. After the data samples are trained, we use the annotators to label the unseen data samples. In practice, not all samples are labeled by an annotator due to several factors such as annotator capacity, difficulty of the task, economical issues and so on. To simulate such a scenario, each of the trained algorithms is allowed to label a data sample with probability p (0, 1]. We test the performance of all the algorithms under different p s and a smaller p means a more challenging scenario. All the results are averaged from 10 random trials.