Selective Verification Strategy for Learning From Crowds

Authors: Tian Tian, Yichi Zhou, Jun Zhu

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We do extensive empirical comparisons on both synthetic and real-world datasets to show the benefits of this new learning setting as well as our proposal. We conduct experiments on both synthetic and real-world datasets to evaluate the benefits of oracle verification, as well as to show the efficacy of our selecting strategy.
Researcher Affiliation Academia Dept. of Comp. Sci. & Tech., CBICR Center, State Key Lab for Intell. Tech. & Systems TNList, Tsinghua University, Beijing, China {tiant16@mails., zhouyc15@mails., dcszj@}tsinghua.edu.cn
Pseudocode No The paper describes algorithms (e.g., EM, gradient descent) in text but does not present them in a structured pseudocode or algorithm block.
Open Source Code No The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We conduct experiments on both synthetic and real-world datasets: Bluebirds (Welinder et al. 2010): There are 2 breeds among 108 bluebird pictures, and each image is labeled by all 39 workers. 4,214 labels are collected in total. Ages (Han and Jain 2014): 165 workers are asked to estimate the ages for 1,002 face images. The final estimates are discretized into 7 bins, and the dataset consists of 10,020 labels in total. Web Search (Zhou et al. 2012): 15,567 responses are collected on the relevance rating for 2,665 query-URL pairs.
Dataset Splits No The paper mentions datasets but does not provide specific details on training, validation, and test splits, such as percentages, sample counts, or cross-validation schemes.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or detailed computer specifications) used for running the experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., programming languages, libraries, or solvers with their respective versions) required to replicate the experiment.
Experiment Setup Yes We conducted a series of experiments with different verification subset sizes B and balance hyperparameters λ. Specifically, we vary B in the range {40, 80, 120, 160, 200}, which is enough to show the changing trends of the performance. Each λ is selected in {0.5, 1, 2, 4}. For each setting, the average results on 5 randomly generated datasets are shown in the first row of Fig. 2.