Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Computing Parametric Ranking Models via Rank-Breaking

Authors: Hossein Azari Soufiani, David Parkes, Lirong Xia

ICML 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results are presented to show the computational efﬁciency along with statistical performance of the proposed method. We conduct experimental studies to compare our algorithm to the MC-EM algorithm for RUMs. We consider RUMs with normal distributions and study running time and Kendall correlation. Experimental results show that our algorithm runs much faster than the MC-EM algorithm while achieving comparable, and sometimes even better Kendall correlation.
Researcher Affiliation	Academia	Hossein Azari Souﬁani EMAIL David C. Parkes EMAIL Harvard University, 33 Oxford Street, Cambridge, MA 02138 USA Lirong Xia EMAIL Rensselaer Polytechnic Institute, Troy, NY 12180, USA
Pseudocode	Yes	Algorithm 1 GMMG(Dr) For all a, a , compute Xa a G (Dr). Compute GMMG(Dr) according to (2) using the moment conditions in (3) (e.g. using gradient descent). return GMMG(Dr).
Open Source Code	Yes	The code is provided in the R package Stat Rank (Chen & Azari Souﬁani, 2013). We plan to extend the algorithms and analysis to partial orders, non-location families such as RUMs parameterized by mean and variance, and to GRUMs (Azari Souﬁani et al., 2013c) and GRUMs with multiple types (Azari Souﬁani et al., 2013b). URL http://cran.r-project. org/web/packages/Stat Rank/index.html.
Open Datasets	No	The synthetic datasets are generated as follows. Let m = 5. The ground truth γ is generated from the Dirichlet distribution Dirichlet( 1) which is a distribution on an m dimensional unit simplex. Then, for any given γ we generate up to n = 200 full rankings from the location family with normal distributions. This describes how data was generated, but does not provide access information to a public dataset.
Dataset Splits	No	The paper mentions generating up to n = 200 full rankings but does not specify any training, validation, or test dataset splits or cross-validation setup.
Hardware Specification	Yes	All experiments are run on a 2.4 Ghz, Intel Core 2 duo 32 bit laptop.
Software Dependencies	No	The code is provided in the R package Stat Rank (Chen & Azari Souﬁani, 2013). While an R package is mentioned, specific version numbers for R or any dependent libraries are not provided.
Experiment Setup	No	The paper describes how synthetic datasets were generated (e.g., m=5, n=200 rankings), but it does not specify concrete experimental setup details such as hyperparameters (learning rates, batch sizes, epochs) for the algorithms used (GMM or MC-EM).