An Analysis of Elo Rating Systems via Markov Chains

Authors: Sam Olesker-Taylor, Luca Zanetti

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present a theoretical analysis of the Elo rating system... In 4, we also provide experimental results that showcase the usefulness of our strategy.
Researcher Affiliation Academia Sam Olesker-Taylor Department of Statistics University of Warwick Coventry, CV4 7AL, UK sam.olesker-taylor@warwick.ac.uk Luca Zanetti Department of Mathematical Sciences University of Bath Bath, BA2 7AY, UK lz2040@bath.ac.uk
Pseudocode Yes Definition 2.1 (Elo Process). Let M R and n 2. Let ρ [ M, M]n with P k ρk = 0. Let q be a distribution on unordered pairs in [n]. Let η (0, 1 4). A step of Elo M(q, ρ; η) proceeds as follows. 0. Suppose that the current vector of ratings is x Rn. 1. Choose unordered pair {I, J} to play according to q: P[{I, J} = {i, j}] = q{i,j} for all i, j [n]. 2. Suppose that Player I beats J, which has probability σ(ρI ρJ). Update ratings x I and x J: xi xi + ησ(xj xi) and xj xj ησ(xj xi). 3. Orthogonally project the full vector of ratings to [ M, M]n {x Rn | P k x k = 0}.
Open Source Code Yes Answer: [Yes] Justification: code is included in the supplementary material.
Open Datasets No We sample the true ratings of the players according to independent Gaussians, with mean equal to 1 on one clique, mean 2 on the other, and standard deviation equal to 0.2 in both. The true ratings are sampled uniformly at random in [ 1, 1]. The true ratings are sampled as follows: independent normal distributions of standard deviation 0.2 and mean 0 for the Erd os Rényi at the bottom of the pyramid, mean 1 for the Erd os Rényi in the middle, and mean 2 for the one at the top. The paper describes generating data, not using a publicly accessible dataset with concrete access information.
Dataset Splits No The paper describes generating synthetic data for simulations and does not mention explicit training, validation, or test dataset splits.
Hardware Specification No The paper does not provide any specific hardware details such as CPU models, GPU models, or cloud resources used for running the experiments.
Software Dependencies No The paper mentions using 'an existing library to compute a von Neumann-Birkhoff decomposition of a matrix' and that the code is included in supplementary material, but it does not specify any software names with version numbers.
Experiment Setup Yes We perform Elo simulations initialising the Elo ratings at zero and setting η = 0.1. We simulate up to 50000 matches for a number of players that goes from 100 to 1000. We repeat each experiment ten times, each time sampling new true ratings.