Crowdsourcing via Pairwise Co-occurrences: Identifiability and Algorithms
Authors: Shahana Ibrahim, Xiao Fu, Nikolaos Kargas, Kejun Huang
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that the proposed algorithms outperform the state-of-art algorithms under a variety of scenarios. |
| Researcher Affiliation | Academia | Shahana Ibrahim School of Elect. Eng. & Computer Sci. Oregon State University Corvallis, OR 97331 ibrahish@oregonstate.edu Xiao Fu School of Elect. Eng. & Computer Sci. Oregon State University Corvallis, OR 97331 xiao.fu@oregonstate.edu Nikos Kargas Department of Elect. & Computer Eng. University of Minnesota Minneapolis, MN 55455 kaga005@umn.edu Kejun Huang Department of Computing & Info. Sci. & Eng. University of Florida Gainesville, FL 32611 kejun.huang@ufl.edu |
| Pseudocode | Yes | Algorithm 1 Multi SPA |
| Open Source Code | No | The paper states 'The algorithms are coded in Matlab.' but does not provide any link or explicit statement about releasing the source code. |
| Open Datasets | Yes | We employ different UCI datasets (https://archive.ics. uci.edu/ml/datasets.html; details in Sec. B). |
| Dataset Splits | No | The paper states 'we use 20% of the samples to act as training data' but does not specify a separate validation set or full training/validation/test splits by percentage or count. |
| Hardware Specification | No | The paper does not specify any particular hardware (GPU, CPU models, etc.) used for running the experiments, only mentioning that the algorithms are coded in Matlab. |
| Software Dependencies | No | The paper mentions 'Matlab' and 'MATLAB machine learning toolbox' but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | In order to train the annotators, we use 20% of the samples to act as training data. After the data samples are trained, we use the annotators to label the unseen data samples. In practice, not all samples are labeled by an annotator due to several factors such as annotator capacity, difficulty of the task, economical issues and so on. To simulate such a scenario, each of the trained algorithms is allowed to label a data sample with probability p (0, 1]. We test the performance of all the algorithms under different p s and a smaller p means a more challenging scenario. All the results are averaged from 10 random trials. |