reproducibilityindex.ai

Leveraging Observations in Bandits: Between Risks and Benefits

Authors: Andrei Lupu, Audrey Durand, Doina Precup6112-6119

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide empirical results showing both great beneﬁts as well as certain limitations inherent to observational learning in the multi-armed bandit setting. Experiments are conducted using targets satisfying theoretical assumptions with high probability, thus narrowing the gap between theory and application. and 6 Experiments
Researcher Affiliation	Academia	Andrei Lupu School of Computer Science Mc Gill University andrei.lupu@mail.mcgill.ca Audrey Durand School of Computer Science Mc Gill University audrey.durand@mcgill.ca Doina Precup School of Computer Science Mc Gill University dprecup@cs.mcgill.ca
Pseudocode	Yes	Algorithm 1 Target-UCB for rewards in [0, 1].
Open Source Code	Yes	We also provide an implementation of the Target-UCB algorithm: https://github.com/ lupuandr/Target-UCB.
Open Datasets	Yes	The resulting dataset can be found at: https://github.com/lupuandr/Target-UCB/tree/ master/Human%20bandit%20dataset.
Dataset Splits	No	The paper discusses different experimental settings (e.g., 2-actions problem, different α values, multi-agent settings) and uses Bernoulli reward distributions, but does not provide specific train/validation/test dataset splits.
Hardware Specification	No	The paper does not specify any hardware details such as GPU or CPU models, or memory specifications used for running the experiments.
Software Dependencies	No	The paper does not mention specific software dependencies or their version numbers (e.g., Python, PyTorch, TensorFlow versions) used in the experiments.
Experiment Setup	Yes	The following experiments evaluate the potential of Target UCB (C=2) in various settings. Bernoulli reward distributions are used in all experiments. Unless indicated otherwise, all results are obtained by averaging over 2000 independent runs. In all ﬁgures, shaded areas indicate one standard deviation above the mean. and The clique size is selected using Equation (6) for δ = 0.001.