An Analysis of Elo Rating Systems via Markov Chains
Authors: Sam Olesker-Taylor, Luca Zanetti
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present a theoretical analysis of the Elo rating system... In 4, we also provide experimental results that showcase the usefulness of our strategy. |
| Researcher Affiliation | Academia | Sam Olesker-Taylor Department of Statistics University of Warwick Coventry, CV4 7AL, UK sam.olesker-taylor@warwick.ac.uk Luca Zanetti Department of Mathematical Sciences University of Bath Bath, BA2 7AY, UK lz2040@bath.ac.uk |
| Pseudocode | Yes | Definition 2.1 (Elo Process). Let M R and n 2. Let ρ [ M, M]n with P k ρk = 0. Let q be a distribution on unordered pairs in [n]. Let η (0, 1 4). A step of Elo M(q, ρ; η) proceeds as follows. 0. Suppose that the current vector of ratings is x Rn. 1. Choose unordered pair {I, J} to play according to q: P[{I, J} = {i, j}] = q{i,j} for all i, j [n]. 2. Suppose that Player I beats J, which has probability σ(ρI ρJ). Update ratings x I and x J: xi xi + ησ(xj xi) and xj xj ησ(xj xi). 3. Orthogonally project the full vector of ratings to [ M, M]n {x Rn | P k x k = 0}. |
| Open Source Code | Yes | Answer: [Yes] Justification: code is included in the supplementary material. |
| Open Datasets | No | We sample the true ratings of the players according to independent Gaussians, with mean equal to 1 on one clique, mean 2 on the other, and standard deviation equal to 0.2 in both. The true ratings are sampled uniformly at random in [ 1, 1]. The true ratings are sampled as follows: independent normal distributions of standard deviation 0.2 and mean 0 for the Erd os Rényi at the bottom of the pyramid, mean 1 for the Erd os Rényi in the middle, and mean 2 for the one at the top. The paper describes generating data, not using a publicly accessible dataset with concrete access information. |
| Dataset Splits | No | The paper describes generating synthetic data for simulations and does not mention explicit training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as CPU models, GPU models, or cloud resources used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'an existing library to compute a von Neumann-Birkhoff decomposition of a matrix' and that the code is included in supplementary material, but it does not specify any software names with version numbers. |
| Experiment Setup | Yes | We perform Elo simulations initialising the Elo ratings at zero and setting η = 0.1. We simulate up to 50000 matches for a number of players that goes from 100 to 1000. We repeat each experiment ten times, each time sampling new true ratings. |