Information Directed Sampling for Stochastic Bandits With Graph Feedback
Authors: Fang Liu, Swapna Buccapatnam, Ness Shroff
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, using numerical evaluations, we demonstrate that our proposed IDS policies outperform existing approaches, including adaptions of upper confidence bound, ϵ-greedy and Exp3 algorithms. |
| Researcher Affiliation | Collaboration | Fang Liu The Ohio State University Columbus, Ohio 43210 liu.3977@osu.edu Swapna Buccapatnam AT&T Labs Research Middletown, NJ 07748 sb646f@att.com Ness Shroff The Ohio State University Columbus, Ohio 43710 shroff.11@osu.edu |
| Pseudocode | Yes | Algorithm 1 Meta-algorithm for Information Directed Sampling with Graph Feedback |
| Open Source Code | No | The paper does not provide any concrete access (links, explicit statements) to open-source code for the described methodology. |
| Open Datasets | No | Section 7 'Numerical Results' describes a simulated Beta-Bernoulli bandit problem where reward values are drawn from a Beta(1,1) distribution. This is an internally generated simulation environment, not a publicly available dataset with concrete access information (link, citation). |
| Dataset Splits | No | The paper mentions running experiments for a given time horizon T and averaging results over 1000 trials, but it does not specify explicit training, validation, or test dataset splits in the conventional sense, as the data is generated through simulation rather than being a fixed dataset. |
| Hardware Specification | No | The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions implementing 'Algorithm 2 in Russo and Van Roy (2014)' and using 'Beta-Bernoulli bandits', but does not list any specific software dependencies with version numbers (e.g., Python, PyTorch, specific libraries). |
| Experiment Setup | Yes | In the experiment, we set K = 5 and T = 1000. All the regret results are averaged over 1000 trials. |