Selective Verification Strategy for Learning From Crowds
Authors: Tian Tian, Yichi Zhou, Jun Zhu
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We do extensive empirical comparisons on both synthetic and real-world datasets to show the benefits of this new learning setting as well as our proposal. We conduct experiments on both synthetic and real-world datasets to evaluate the benefits of oracle verification, as well as to show the efficacy of our selecting strategy. |
| Researcher Affiliation | Academia | Dept. of Comp. Sci. & Tech., CBICR Center, State Key Lab for Intell. Tech. & Systems TNList, Tsinghua University, Beijing, China {tiant16@mails., zhouyc15@mails., dcszj@}tsinghua.edu.cn |
| Pseudocode | No | The paper describes algorithms (e.g., EM, gradient descent) in text but does not present them in a structured pseudocode or algorithm block. |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We conduct experiments on both synthetic and real-world datasets: Bluebirds (Welinder et al. 2010): There are 2 breeds among 108 bluebird pictures, and each image is labeled by all 39 workers. 4,214 labels are collected in total. Ages (Han and Jain 2014): 165 workers are asked to estimate the ages for 1,002 face images. The final estimates are discretized into 7 bins, and the dataset consists of 10,020 labels in total. Web Search (Zhou et al. 2012): 15,567 responses are collected on the relevance rating for 2,665 query-URL pairs. |
| Dataset Splits | No | The paper mentions datasets but does not provide specific details on training, validation, and test splits, such as percentages, sample counts, or cross-validation schemes. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or detailed computer specifications) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., programming languages, libraries, or solvers with their respective versions) required to replicate the experiment. |
| Experiment Setup | Yes | We conducted a series of experiments with different verification subset sizes B and balance hyperparameters λ. Specifically, we vary B in the range {40, 80, 120, 160, 200}, which is enough to show the changing trends of the performance. Each λ is selected in {0.5, 1, 2, 4}. For each setting, the average results on 5 randomly generated datasets are shown in the first row of Fig. 2. |