reproducibilityindex.ai

The Effectiveness of Peer Prediction in Long-Term Forecasting

Authors: Mandal Debmalya, Radanović Goran, Parkes David2160-2167

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through a large-scale experiment on Amazon Mechanical Turk (MTurk), we investigate whether peer prediction methods can be used to complement methods of proper scoring rules, and improve engagement of users and ultimately the quality of forecasts.
Researcher Affiliation	Academia	Data Science Institute, Columbia University Max Planck Institute for Software Systems John A. Paulson School of Engineering and Applied Sciences, Harvard University dm3557@columbia.edu, gradanovic@mpi-sws.org, and parkes@eecs.harvard.edu
Pseudocode	No	The paper describes methods (Brier Scoring Rule, CA Mechanism) using mathematical formulas and prose, but does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statements or links indicating that the source code for the described methodology or platform is openly available.
Open Datasets	No	The paper describes collecting data via Amazon Mechanical Turk for forecasting questions, and mentions that supplementary material includes questions and outcomes. However, it does not provide a direct link, DOI, repository, or formal citation for public access to the raw experimental dataset.
Dataset Splits	No	The paper describes an experiment with human participants and incentive schemes, not a machine learning model. Therefore, the concepts of training, validation, and test dataset splits are not applicable or provided in the context of model training and evaluation.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory, cloud instances) used to conduct the experiments or analysis.
Software Dependencies	No	The paper mentions 'Amazon Mechanical Turk' as the platform and discusses 'Brier scoring rule' and 'Correlated Agreement mechanism' as methods, but it does not list any specific software dependencies with version numbers.
Experiment Setup	Yes	The paper details various aspects of the experimental setup, including: 'Overall, 945 workers out of the 1400 workers who participated in the recruitment HIT signed up for the forecasting HIT, but we considered only 891 of the 945 workers', the definition of four treatments (SR, PP, SR+PPRank, SR+PP), 'The recruitment HIT was posted on October 6, 2018, while the forecasting HIT was posted on October 7 and was online till October 13', 'We asked the workers to provide forecasts on 18 different questions', and 'We chose 0.71 as the threshold' for easy questions. It also specifies how payments were normalized to '4 cents on average for each task and each day' and that 'optimal value of a was 2' for extremizing forecasts.