Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Concentric mixtures of Mallows models for top-$k$ rankings: sampling and identifiability
Authors: Fabien Collas, Ekhine Irurozki
ICML 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we validate empirically our proposal. The experimental framework is as follows. In the first two experiments, we generate a sample of partial rankings, using Algorithm 1, with parameters n = 30 and k = 10, from a mixture of concentric MM, both centered at a random σ0 and with two dispersion parameters, θb, θg. The mixture parameter is denoted r. |
| Researcher Affiliation | Academia | 1Basque Center for Applied Mathematics, Bilbao, Spain. 2LTCI, Telecom Paris, Institut Polytechnique de Paris. |
| Pseudocode | Yes | Algorithm 1 Sample top-k in O(k log k) Data: n, k, θ, σ0 Result: σ: Top-k ranking of n items distributed according to M(σ0, θ) for j [1, k] do Vj(πσ 1 0 ) = random choice in [n j] with choice probabilities of Eq. (3) πσ 1 0 = transform V (πσ 1 0 ) with the bijection in (Mc Clellan et al., 1974) return π 1 end |
| Open Source Code | Yes | Software implementing the algorithms described here is distributed in https://github.com/ ekhiru/top-k-mallows. |
| Open Datasets | Yes | To test the identifiability on real data, we used a dataset already used in (Fligner & Verducci, 1986), for which 98 college students were asked to rank five words according to its strength of association with the word idea . |
| Dataset Splits | No | The paper describes generating samples for experiments and their sizes (e.g., “mg = 40 rankings from a M(σ0, θg)”, “using the same growing sample, with size {1, 2, 3, ..., 44}”), but does not provide specific train/validation/test splits or methodology for data partitioning. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper provides a link to its software implementation but does not list specific software dependencies (e.g., libraries, frameworks) with their version numbers required for reproduction. |
| Experiment Setup | Yes | In the first two experiments, we generate a sample of partial rankings, using Algorithm 1, with parameters n = 30 and k = 10, from a mixture of concentric MM, both centered at a random σ0 and with two dispersion parameters, θb, θg. The mixture parameter is denoted r. ... mg = 40 rankings from a M(σ0, θg) such that E[d(γ, σ0)] {3, 8, 13, . . . , 48} mb = 60 rankings from a M(σ0, θb) such that E[d(β, σ0)] = c E[d(γ, σ0)] with 40 > c 3 and E[d(γ, σ0)] 217 (bound corresponding to the uniform distribution). |