Optimistic Gittins Indices

Authors: Eli Gutin, Vivek Farias

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show empirically that even the simplest optimistic approximation to the Gittins index proposed here outperforms the state-of-the-art incumbent schemes discussed in this introduction by a non-trivial margin. We view this as our primary contribution the Bayesian MAB problem is fundamental making the performance improvements we demonstrate important.
Researcher Affiliation Academia Eli Gutin Operations Research Center, MIT Cambridge, MA 02142 gutin@mit.edu Vivek F. Farias MIT Sloan School of Management Cambridge, MA 02142 vivekf@mit.edu
Pseudocode Yes We summarize the Optimistic Gittins Index (OGI) algorithm succinctly as follows. Assume the state of arm i at time t is given by yi,t, and let γt = 1 1/t. Play an arm i 2 arg max and update the posterior on the arm based on the observed reward.
Open Source Code No The paper does not provide any statement about making its source code publicly available, nor does it include any links to a code repository.
Open Datasets No The paper describes experiments involving simulated data generated from Gaussian and Bernoulli distributions (e.g., 'arms generate Gaussian rewards Xi,t N( i, 1)', 'Bernoulli parameter, over 1000 independent trials'), but it does not use or provide access to a specific named public dataset with a link, DOI, or citation.
Dataset Splits No The paper describes its experiments in terms of 'trials' and 'time periods' (e.g., 'simulate 1000 independent trials with 10 arms and 1000 time periods') rather than specifying training, validation, or test dataset splits. No information on data partitioning is provided.
Hardware Specification No The paper mentions 'CPU time' in the context of computational burden but does not provide any specific details about the hardware used for running experiments, such as CPU or GPU models, or memory specifications.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies, libraries, or programming languages used in the experiments.
Experiment Setup Yes We used a common discount factor schedule in all experiments setting γt = 1 1/(100 + t). The choice of = 100 is second order and our conclusions remain unchanged (and actually appear to improve in an absolute sense) with other choices (we show this in a second set of experiments). The implementation of OGI in this experiment uses K = 1. We simulate 1000 independent trials with 10 arms and 1000 time periods. [...] regret is simulated over 1000 periods, with 10 arms each having a uniformly distributed Bernoulli parameter, over 1000 independent trials.