Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

The Surprising Effectiveness of SP Voting with Partial Preferences

Authors: Hadi Hosseini, Debmalya Mandal, Amrit Puhan

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through a large-scale crowdsourcing experiment on MTurk, we show that both of our approaches outperform conventional preference aggregation algorithms for the recovery of ground truth rankings, when measured in terms of Kendall-Tau distance and Spearman s . We conduct a human-subject study with 432 participants recruited from Amazon s Mechanical Turk (MTurk) to empirically evaluate the performance of our SP algorithms using metrics such as the Kendall-Tau distance from the full ground truth ranking and Spearman s rank correlation coefficient. We further analyze the collected data and demonstrate that voters behavior in the experiment, including the minority of the experts, and the SP phenomenon, can be correctly simulated by a concentric mixtures of Mallows model.
Researcher Affiliation	Academia	Hadi Hosseini College of Information Sciences and Technology Penn State University, USA EMAIL Debmalya Mandal Department of Computer Science University of Warwick, UK EMAIL Amrit Puhan College of Information Sciences and Technology Penn State University, USA EMAIL
Pseudocode	Yes	Explanation and pseudocode for Partial-SP and Aggregated-SP is provided in Appendix D.2 and Appendix D.3, respectively. ALGORITHM 1: Extract-Reports, ALGORITHM 2: Partial-SP, ALGORITHM 3: Aggregated-SP Aggregation.
Open Source Code	Yes	The dataset can be found here -https://github.com/amrit19/Surprisingly-Popular-Voting-Partial. The associated NeurIPS checklist also indicates that code is provided for reproducibility.
Open Datasets	Yes	The survey encompassed three distinct domains: (i) The geography dataset contains 36 countries with their population estimates, according to the United Nations, (ii) The movies dataset contains 36 movies with their lifetime box-office gross earnings, and (iii) The paintings dataset contains 36 paintings with their latest auction prices. The dataset can be found here -https://github.com/amrit19/Surprisingly-Popular-Voting-Partial
Dataset Splits	No	The paper describes a human-subject crowdsourcing experiment and then evaluates aggregation algorithms on the collected data. It does not mention explicit training, validation, or test splits of a dataset in the context of machine learning model training or hyperparameter tuning.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, cloud instances) used to run the experiments or simulations.
Software Dependencies	No	The paper mentions the use of "Stan [8]" for Bayesian inference but does not specify its version or the versions of any other software libraries or programming languages used for implementation.
Experiment Setup	Yes	Each participant was presented with a subset of 5 alternatives, selected based on an interalternative gap of 6 positions within the ground-truth ranking. We tested subset sizes of 4 to 6 and interalternative gaps of 3 to 8... For each combination of 12 subsets, 9 elicitation formats, and 3 domains, each question received 16 responses. ... In our experiments we use = 0.55 and = 0.1.