Optimistic Gittins Indices
Authors: Eli Gutin, Vivek Farias
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show empirically that even the simplest optimistic approximation to the Gittins index proposed here outperforms the state-of-the-art incumbent schemes discussed in this introduction by a non-trivial margin. We view this as our primary contribution the Bayesian MAB problem is fundamental making the performance improvements we demonstrate important. |
| Researcher Affiliation | Academia | Eli Gutin Operations Research Center, MIT Cambridge, MA 02142 gutin@mit.edu Vivek F. Farias MIT Sloan School of Management Cambridge, MA 02142 vivekf@mit.edu |
| Pseudocode | Yes | We summarize the Optimistic Gittins Index (OGI) algorithm succinctly as follows. Assume the state of arm i at time t is given by yi,t, and let γt = 1 1/t. Play an arm i 2 arg max and update the posterior on the arm based on the observed reward. |
| Open Source Code | No | The paper does not provide any statement about making its source code publicly available, nor does it include any links to a code repository. |
| Open Datasets | No | The paper describes experiments involving simulated data generated from Gaussian and Bernoulli distributions (e.g., 'arms generate Gaussian rewards Xi,t N( i, 1)', 'Bernoulli parameter, over 1000 independent trials'), but it does not use or provide access to a specific named public dataset with a link, DOI, or citation. |
| Dataset Splits | No | The paper describes its experiments in terms of 'trials' and 'time periods' (e.g., 'simulate 1000 independent trials with 10 arms and 1000 time periods') rather than specifying training, validation, or test dataset splits. No information on data partitioning is provided. |
| Hardware Specification | No | The paper mentions 'CPU time' in the context of computational burden but does not provide any specific details about the hardware used for running experiments, such as CPU or GPU models, or memory specifications. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies, libraries, or programming languages used in the experiments. |
| Experiment Setup | Yes | We used a common discount factor schedule in all experiments setting γt = 1 1/(100 + t). The choice of = 100 is second order and our conclusions remain unchanged (and actually appear to improve in an absolute sense) with other choices (we show this in a second set of experiments). The implementation of OGI in this experiment uses K = 1. We simulate 1000 independent trials with 10 arms and 1000 time periods. [...] regret is simulated over 1000 periods, with 10 arms each having a uniformly distributed Bernoulli parameter, over 1000 independent trials. |