reproducibilityindex.ai

Data-driven Rank Breaking for Efficient Rank Aggregation

Authors: Ashish Khetan, Sewoong Oh

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present the main theoretical results accompanied by corresponding numerical simulations in this section. In all our experiments, unless otherwise stated, we ﬁx d = 1024, n = 128000, κj = κ = 128, and ℓj = ℓ= 16, and each point is average over 100 instances. Also, each sample is a partial ranking from a set of κ alternatives chosen uniformly at random, where the partial ranking is from a PL model with weights θ chosen i.i.d. uniformly over [ b, b] with b = 2. In the top left ﬁgure, we vary the number of separators ℓj = ℓ, and in the top right we vary the number of samples n. We see the mean squared error scaling as 1/(ℓn) as predicted by our theorem. [...] On real-world datasets on sushi preferences (Kamishima, 2003), we show that the data-driven rank-breaking improves over Generalized Method-of-Moments (GMM) proposed by Azari Souﬁani et al. (2013). This is widely used for rank aggregation, for instance in (Azari Souﬁani et al., 2013; 2012; Maystre & Grossglauser, 2015b). The dataset consists of complete rankings over 10 types of sushi from n = 5000 individuals. We follow the experimental sce narios of the GMM in (Azari Souﬁani et al., 2013) for fair comparisons. To validate our approach, we ﬁrst take the estimated PL weights of the 10 types of sushi, using Hunter s implementation (Hunter, 2004) of the ML estimator, over the entire input data of 5000 complete rankings. We take thus created output as the ground truth θ . To create partial rankings and compare the performance of the data-driven rank-breaking to the state-of-the-art GMM approach in Figure 6, we ﬁrst ﬁx ℓ= 6 and vary n to simulate top-ℓ-separators scenario by removing the known ordering among bottom 10 ℓalternatives for each sample in the dataset (left). We next ﬁx n = 1000 and vary ℓand simulate top-ℓ-separators scenarios (right). Each point is averaged over 1000 instances. The mean squared error is plotted for both algorithms.
Researcher Affiliation	Academia	Ashish Khetan KHETAN2@ILLINOIS.EDU ISE Department, University of Illinois at Urbana-Champaign, Urbana, IL 61801 USA Sewoong Oh SWOH@ILLINOIS.EDU ISE Department, University of Illinois at Urbana-Champaign, Urbana, IL 61801 USA
Pseudocode	No	No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code	No	No explicit statement about releasing source code or a link to a code repository for the methodology described in this paper was found.
Open Datasets	Yes	On real-world datasets on sushi preferences (Kamishima, 2003)
Dataset Splits	No	In all our experiments, unless otherwise stated, we ﬁx d = 1024, n = 128000, κj = κ = 128, and ℓj = ℓ= 16, and each point is average over 100 instances. [...] Each point is averaged over 1000 instances." (Explanation: The paper describes averaging over instances but does not provide specific train/test/validation splits, percentages, or detailed cross-validation methodology.)
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory, or processor types) used for running experiments were mentioned in the paper.
Software Dependencies	No	Hunter's implementation (Hunter, 2004) of the ML estimator" (Explanation: The paper mentions a specific implementation but does not provide version numbers for any software dependencies or libraries used.)
Experiment Setup	Yes	In all our experiments, unless otherwise stated, we ﬁx d = 1024, n = 128000, κj = κ = 128, and ℓj = ℓ= 16, and each point is average over 100 instances. Also, each sample is a partial ranking from a set of κ alternatives chosen uniformly at random, where the partial ranking is from a PL model with weights θ chosen i.i.d. uniformly over [ b, b] with b = 2. [...] To create partial rankings and compare the performance of the data-driven rank-breaking to the state-of-the-art GMM approach in Figure 6, we ﬁrst ﬁx ℓ= 6 and vary n to simulate top-ℓ-separators scenario by removing the known ordering among bottom 10 ℓalternatives for each sample in the dataset (left). We next ﬁx n = 1000 and vary ℓand simulate top-ℓ-separators scenarios (right).