Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Re-Ranking Voting-Based Answers by Discarding User Behavior Biases

Authors: Xiaochi Wei, Heyan Huang, Chin-Yew Lin, Xin Xin, Xianling Mao, Shangguang Wang

IJCAI 2015 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments in real data demonstrate how the ranking performance of the proposed model outperforms traditional methods with biases ignored by 15.1% in precision@1, and 11.7% in the mean reciprocal rank.
Researcher Affiliation	Collaboration	Xiaochi Wei1 , Heyan Huang1, Chin-Yew Lin2, Xin Xin1 , Xianling Mao1, Shangguang Wang3 1BJ ER Center of HVLIP&CC, School of Comp. Sci., Beijing Institute of Technology, Beijing, China 2Microsoft Research Asia, Beijing, China 3State Key Lab. of Net. and Swit. Tech., Beijing Univ. of Posts and Tele., Beijing, China EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not explicitly state that the source code for the described methodology is publicly available, nor does it provide a direct link to a code repository.
Open Datasets	Yes	We collect a large dataset of c QA, including more than 110,000 questions... from Chinese c QA site Guokr1. Every item (question, answer and vote) has a time stamp. 1http://www.guokr.com/
Dataset Splits	No	The paper states, 'The former 5% - 30% of votes on test questions is used as training data, together with the votes on other questions in the dataset.' This implies a training and test split, but there is no explicit mention of a validation set or its specific split percentage/methodology for reproducibility.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup	Yes	The paper discusses the use of a logistic function and parameter α to balance biases: 'A logistic function σ(x) = 1/(1 + e^x) is utilized to describe these two biases...' and 'α is employed to balance these two parts: γau = αAa + (1 − α)Pau'. Section 4.4 'Parameter Analyses' further details the analysis of parameter α, and the use of 5% to 30% of votes as training data.