reproducibilityindex.ai

Information Directed Sampling for Stochastic Bandits With Graph Feedback

Authors: Fang Liu, Swapna Buccapatnam, Ness Shroff

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, using numerical evaluations, we demonstrate that our proposed IDS policies outperform existing approaches, including adaptions of upper conﬁdence bound, ϵ-greedy and Exp3 algorithms.
Researcher Affiliation	Collaboration	Fang Liu The Ohio State University Columbus, Ohio 43210 liu.3977@osu.edu Swapna Buccapatnam AT&T Labs Research Middletown, NJ 07748 sb646f@att.com Ness Shroff The Ohio State University Columbus, Ohio 43710 shroff.11@osu.edu
Pseudocode	Yes	Algorithm 1 Meta-algorithm for Information Directed Sampling with Graph Feedback
Open Source Code	No	The paper does not provide any concrete access (links, explicit statements) to open-source code for the described methodology.
Open Datasets	No	Section 7 'Numerical Results' describes a simulated Beta-Bernoulli bandit problem where reward values are drawn from a Beta(1,1) distribution. This is an internally generated simulation environment, not a publicly available dataset with concrete access information (link, citation).
Dataset Splits	No	The paper mentions running experiments for a given time horizon T and averaging results over 1000 trials, but it does not specify explicit training, validation, or test dataset splits in the conventional sense, as the data is generated through simulation rather than being a fixed dataset.
Hardware Specification	No	The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions implementing 'Algorithm 2 in Russo and Van Roy (2014)' and using 'Beta-Bernoulli bandits', but does not list any specific software dependencies with version numbers (e.g., Python, PyTorch, specific libraries).
Experiment Setup	Yes	In the experiment, we set K = 5 and T = 1000. All the regret results are averaged over 1000 trials.