Concentric mixtures of Mallows models for top-$k$ rankings: sampling and identifiability

Authors: Fabien Collas, Ekhine Irurozki

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we validate empirically our proposal. The experimental framework is as follows. In the first two experiments, we generate a sample of partial rankings, using Algorithm 1, with parameters n = 30 and k = 10, from a mixture of concentric MM, both centered at a random σ0 and with two dispersion parameters, θb, θg. The mixture parameter is denoted r.
Researcher Affiliation Academia 1Basque Center for Applied Mathematics, Bilbao, Spain. 2LTCI, Telecom Paris, Institut Polytechnique de Paris.
Pseudocode Yes Algorithm 1 Sample top-k in O(k log k) Data: n, k, θ, σ0 Result: σ: Top-k ranking of n items distributed according to M(σ0, θ) for j [1, k] do Vj(πσ 1 0 ) = random choice in [n j] with choice probabilities of Eq. (3) πσ 1 0 = transform V (πσ 1 0 ) with the bijection in (Mc Clellan et al., 1974) return π 1 end
Open Source Code Yes Software implementing the algorithms described here is distributed in https://github.com/ ekhiru/top-k-mallows.
Open Datasets Yes To test the identifiability on real data, we used a dataset already used in (Fligner & Verducci, 1986), for which 98 college students were asked to rank five words according to its strength of association with the word idea .
Dataset Splits No The paper describes generating samples for experiments and their sizes (e.g., “mg = 40 rankings from a M(σ0, θg)”, “using the same growing sample, with size {1, 2, 3, ..., 44}”), but does not provide specific train/validation/test splits or methodology for data partitioning.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper provides a link to its software implementation but does not list specific software dependencies (e.g., libraries, frameworks) with their version numbers required for reproduction.
Experiment Setup Yes In the first two experiments, we generate a sample of partial rankings, using Algorithm 1, with parameters n = 30 and k = 10, from a mixture of concentric MM, both centered at a random σ0 and with two dispersion parameters, θb, θg. The mixture parameter is denoted r. ... mg = 40 rankings from a M(σ0, θg) such that E[d(γ, σ0)] {3, 8, 13, . . . , 48} mb = 60 rankings from a M(σ0, θb) such that E[d(β, σ0)] = c E[d(γ, σ0)] with 40 > c 3 and E[d(γ, σ0)] 217 (bound corresponding to the uniform distribution).