Information Directed Sampling for Stochastic Bandits With Graph Feedback

Authors: Fang Liu, Swapna Buccapatnam, Ness Shroff

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, using numerical evaluations, we demonstrate that our proposed IDS policies outperform existing approaches, including adaptions of upper confidence bound, ϵ-greedy and Exp3 algorithms.
Researcher Affiliation Collaboration Fang Liu The Ohio State University Columbus, Ohio 43210 liu.3977@osu.edu Swapna Buccapatnam AT&T Labs Research Middletown, NJ 07748 sb646f@att.com Ness Shroff The Ohio State University Columbus, Ohio 43710 shroff.11@osu.edu
Pseudocode Yes Algorithm 1 Meta-algorithm for Information Directed Sampling with Graph Feedback
Open Source Code No The paper does not provide any concrete access (links, explicit statements) to open-source code for the described methodology.
Open Datasets No Section 7 'Numerical Results' describes a simulated Beta-Bernoulli bandit problem where reward values are drawn from a Beta(1,1) distribution. This is an internally generated simulation environment, not a publicly available dataset with concrete access information (link, citation).
Dataset Splits No The paper mentions running experiments for a given time horizon T and averaging results over 1000 trials, but it does not specify explicit training, validation, or test dataset splits in the conventional sense, as the data is generated through simulation rather than being a fixed dataset.
Hardware Specification No The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions implementing 'Algorithm 2 in Russo and Van Roy (2014)' and using 'Beta-Bernoulli bandits', but does not list any specific software dependencies with version numbers (e.g., Python, PyTorch, specific libraries).
Experiment Setup Yes In the experiment, we set K = 5 and T = 1000. All the regret results are averaged over 1000 trials.