reproducibilityindex.ai

Eliciting Categorical Data for Optimal Aggregation

Authors: Chien-Ju Ho, Rafael Frongillo, Yiling Chen

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Using theoretical analysis, simulations, and experiments, we provide answers to several interesting questions. Our main results are summarized as follows: ... We conduct human-subject experiments on Amazon s Mechanical Turk and demonstrate that our optimal binary-choice interface leads to better prediction accuracy than a natural baseline interface (Section 5.3).
Researcher Affiliation	Academia	Chien-Ju Ho Cornell University ch624@cornell.edu Rafael Frongillo CU Boulder raf@colorado.edu Yiling Chen Harvard University yiling@seas.harvard.edu
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statement or link regarding the release of open-source code for its methodology.
Open Datasets	No	The paper mentions using a dataset collected from 'previous work [2]' but does not provide concrete access information (link, DOI, specific repository, or explicit statement of public availability) for this dataset. The other data is collected via their own human-subject experiment on Mechanical Turk, not a pre-existing public dataset.
Dataset Splits	No	The paper does not provide specific details on training, validation, or test dataset splits (e.g., percentages, sample counts, or methodology for splitting) that would allow reproduction of the data partitioning.
Hardware Specification	No	The paper does not specify any hardware details (e.g., GPU/CPU models, memory, or specific computer specifications) used for running its experiments.
Software Dependencies	No	The paper does not provide specific software dependencies or version numbers for any libraries, frameworks, or tools used in the experiments.
Experiment Setup	Yes	In our experiment, workers are asked to label 20 blurred images of textures. We considered an asymmetric prior: 80% of the images were carpet and 20% were granite, and we communicated this to the workers. ... The Baseline treatment is the mostly commonly seen interface in crowdsourcing markets. ... In the Prob Based interface, the worker was asked whether she thinks the probability of the image to be Carpet is {more than 80%, no more than 80%}. ... For our heuristics, we used the model with n = 4 and = 0.85 for every case here; ... We choose the simplest model (n = 1) for HA though the results are robust for higher n.