reproducibilityindex.ai

Selective Verification Strategy for Learning From Crowds

Authors: Tian Tian, Yichi Zhou, Jun Zhu

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We do extensive empirical comparisons on both synthetic and real-world datasets to show the beneﬁts of this new learning setting as well as our proposal. We conduct experiments on both synthetic and real-world datasets to evaluate the beneﬁts of oracle veriﬁcation, as well as to show the efﬁcacy of our selecting strategy.
Researcher Affiliation	Academia	Dept. of Comp. Sci. & Tech., CBICR Center, State Key Lab for Intell. Tech. & Systems TNList, Tsinghua University, Beijing, China {tiant16@mails., zhouyc15@mails., dcszj@}tsinghua.edu.cn
Pseudocode	No	The paper describes algorithms (e.g., EM, gradient descent) in text but does not present them in a structured pseudocode or algorithm block.
Open Source Code	No	The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	We conduct experiments on both synthetic and real-world datasets: Bluebirds (Welinder et al. 2010): There are 2 breeds among 108 bluebird pictures, and each image is labeled by all 39 workers. 4,214 labels are collected in total. Ages (Han and Jain 2014): 165 workers are asked to estimate the ages for 1,002 face images. The ﬁnal estimates are discretized into 7 bins, and the dataset consists of 10,020 labels in total. Web Search (Zhou et al. 2012): 15,567 responses are collected on the relevance rating for 2,665 query-URL pairs.
Dataset Splits	No	The paper mentions datasets but does not provide specific details on training, validation, and test splits, such as percentages, sample counts, or cross-validation schemes.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or detailed computer specifications) used for running the experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers (e.g., programming languages, libraries, or solvers with their respective versions) required to replicate the experiment.
Experiment Setup	Yes	We conducted a series of experiments with different veriﬁcation subset sizes B and balance hyperparameters λ. Speciﬁcally, we vary B in the range {40, 80, 120, 160, 200}, which is enough to show the changing trends of the performance. Each λ is selected in {0.5, 1, 2, 4}. For each setting, the average results on 5 randomly generated datasets are shown in the ﬁrst row of Fig. 2.