Distributed Multi-Player Bandits - a Game of Thrones Approach
Authors: Ilai Bistritz, Amir Leshem
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We simulated a multi-armed bandit game with {µn,i} that are chosen independently and uniformly at random in [0.05, 0.95]. The rewards are generated as rn,i (t) = µn,i + zn,i (t) where {zn,i (t)} are independent and uniformly distributed on [ 0.05, 0.05] for each n, i. In Fig. 2, we present the sample mean of the accumulated sum of utilities PN n=1 1 t Pt τ=1 un (a (τ)) as a function of time t, averaged over 100 experiments. |
| Researcher Affiliation | Academia | Ilai Bistritz Stanford University bistritz@stanford.edu Amir Leshem Bar Ilan University Amir.Leshem@biu.ac.il |
| Pseudocode | Yes | Algorithm 1 Game of Thrones Algorithm and Algorithm 2 Game of Thrones Dynamics |
| Open Source Code | No | The paper does not provide any explicit statements about the release of open-source code, nor does it include a link to a code repository. |
| Open Datasets | No | The paper describes a simulated environment where data is generated for experiments rather than using or providing a publicly available dataset. It states: 'We simulated a multi-armed bandit game with {µn,i} that are chosen independently and uniformly at random in [0.05, 0.95]. The rewards are generated as rn,i (t) = µn,i + zn,i (t) where {zn,i (t)} are independent and uniformly distributed on [ 0.05, 0.05] for each n, i.' |
| Dataset Splits | No | The paper does not mention train/validation/test dataset splits. It describes an online learning framework with 'exploration', 'Game of Thrones (Go T)', and 'exploitation' phases within its simulation. |
| Hardware Specification | No | The paper describes its simulations ('We simulated a multi-armed bandit game...') but does not provide any specific details about the hardware (e.g., CPU, GPU, memory) used to conduct these simulations. |
| Software Dependencies | No | The paper describes numerical simulations but does not specify any software dependencies or their version numbers (e.g., programming languages, libraries, or solvers). |
| Experiment Setup | Yes | Hence we choose c1 = 1000, c2 = c3 = 6000. We use ρ = 1/2 in the simulations we present, since the performance is very similar for ρ values not too close to zero or one. We use c = N, that gives the highest possible escape probability of εc from a content state. |