Concentric mixtures of Mallows models for top-$k$ rankings: sampling and identifiability
Authors: Fabien Collas, Ekhine Irurozki
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we validate empirically our proposal. The experimental framework is as follows. In the first two experiments, we generate a sample of partial rankings, using Algorithm 1, with parameters n = 30 and k = 10, from a mixture of concentric MM, both centered at a random σ0 and with two dispersion parameters, θb, θg. The mixture parameter is denoted r. |
| Researcher Affiliation | Academia | 1Basque Center for Applied Mathematics, Bilbao, Spain. 2LTCI, Telecom Paris, Institut Polytechnique de Paris. |
| Pseudocode | Yes | Algorithm 1 Sample top-k in O(k log k) Data: n, k, θ, σ0 Result: σ: Top-k ranking of n items distributed according to M(σ0, θ) for j [1, k] do Vj(πσ 1 0 ) = random choice in [n j] with choice probabilities of Eq. (3) πσ 1 0 = transform V (πσ 1 0 ) with the bijection in (Mc Clellan et al., 1974) return π 1 end |
| Open Source Code | Yes | Software implementing the algorithms described here is distributed in https://github.com/ ekhiru/top-k-mallows. |
| Open Datasets | Yes | To test the identifiability on real data, we used a dataset already used in (Fligner & Verducci, 1986), for which 98 college students were asked to rank five words according to its strength of association with the word idea . |
| Dataset Splits | No | The paper describes generating samples for experiments and their sizes (e.g., “mg = 40 rankings from a M(σ0, θg)”, “using the same growing sample, with size {1, 2, 3, ..., 44}”), but does not provide specific train/validation/test splits or methodology for data partitioning. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper provides a link to its software implementation but does not list specific software dependencies (e.g., libraries, frameworks) with their version numbers required for reproduction. |
| Experiment Setup | Yes | In the first two experiments, we generate a sample of partial rankings, using Algorithm 1, with parameters n = 30 and k = 10, from a mixture of concentric MM, both centered at a random σ0 and with two dispersion parameters, θb, θg. The mixture parameter is denoted r. ... mg = 40 rankings from a M(σ0, θg) such that E[d(γ, σ0)] {3, 8, 13, . . . , 48} mb = 60 rankings from a M(σ0, θb) such that E[d(β, σ0)] = c E[d(γ, σ0)] with 40 > c 3 and E[d(γ, σ0)] 217 (bound corresponding to the uniform distribution). |