Data-driven Rank Breaking for Efficient Rank Aggregation
Authors: Ashish Khetan, Sewoong Oh
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present the main theoretical results accompanied by corresponding numerical simulations in this section. In all our experiments, unless otherwise stated, we fix d = 1024, n = 128000, κj = κ = 128, and ℓj = ℓ= 16, and each point is average over 100 instances. Also, each sample is a partial ranking from a set of κ alternatives chosen uniformly at random, where the partial ranking is from a PL model with weights θ chosen i.i.d. uniformly over [ b, b] with b = 2. In the top left figure, we vary the number of separators ℓj = ℓ, and in the top right we vary the number of samples n. We see the mean squared error scaling as 1/(ℓn) as predicted by our theorem. [...] On real-world datasets on sushi preferences (Kamishima, 2003), we show that the data-driven rank-breaking improves over Generalized Method-of-Moments (GMM) proposed by Azari Soufiani et al. (2013). This is widely used for rank aggregation, for instance in (Azari Soufiani et al., 2013; 2012; Maystre & Grossglauser, 2015b). The dataset consists of complete rankings over 10 types of sushi from n = 5000 individuals. We follow the experimental sce narios of the GMM in (Azari Soufiani et al., 2013) for fair comparisons. To validate our approach, we first take the estimated PL weights of the 10 types of sushi, using Hunter s implementation (Hunter, 2004) of the ML estimator, over the entire input data of 5000 complete rankings. We take thus created output as the ground truth θ . To create partial rankings and compare the performance of the data-driven rank-breaking to the state-of-the-art GMM approach in Figure 6, we first fix ℓ= 6 and vary n to simulate top-ℓ-separators scenario by removing the known ordering among bottom 10 ℓalternatives for each sample in the dataset (left). We next fix n = 1000 and vary ℓand simulate top-ℓ-separators scenarios (right). Each point is averaged over 1000 instances. The mean squared error is plotted for both algorithms. |
| Researcher Affiliation | Academia | Ashish Khetan KHETAN2@ILLINOIS.EDU ISE Department, University of Illinois at Urbana-Champaign, Urbana, IL 61801 USA Sewoong Oh SWOH@ILLINOIS.EDU ISE Department, University of Illinois at Urbana-Champaign, Urbana, IL 61801 USA |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | No explicit statement about releasing source code or a link to a code repository for the methodology described in this paper was found. |
| Open Datasets | Yes | On real-world datasets on sushi preferences (Kamishima, 2003) |
| Dataset Splits | No | In all our experiments, unless otherwise stated, we fix d = 1024, n = 128000, κj = κ = 128, and ℓj = ℓ= 16, and each point is average over 100 instances. [...] Each point is averaged over 1000 instances." (Explanation: The paper describes averaging over instances but does not provide specific train/test/validation splits, percentages, or detailed cross-validation methodology.) |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory, or processor types) used for running experiments were mentioned in the paper. |
| Software Dependencies | No | Hunter's implementation (Hunter, 2004) of the ML estimator" (Explanation: The paper mentions a specific implementation but does not provide version numbers for any software dependencies or libraries used.) |
| Experiment Setup | Yes | In all our experiments, unless otherwise stated, we fix d = 1024, n = 128000, κj = κ = 128, and ℓj = ℓ= 16, and each point is average over 100 instances. Also, each sample is a partial ranking from a set of κ alternatives chosen uniformly at random, where the partial ranking is from a PL model with weights θ chosen i.i.d. uniformly over [ b, b] with b = 2. [...] To create partial rankings and compare the performance of the data-driven rank-breaking to the state-of-the-art GMM approach in Figure 6, we first fix ℓ= 6 and vary n to simulate top-ℓ-separators scenario by removing the known ordering among bottom 10 ℓalternatives for each sample in the dataset (left). We next fix n = 1000 and vary ℓand simulate top-ℓ-separators scenarios (right). |