Leveraging Observations in Bandits: Between Risks and Benefits
Authors: Andrei Lupu, Audrey Durand, Doina Precup6112-6119
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide empirical results showing both great benefits as well as certain limitations inherent to observational learning in the multi-armed bandit setting. Experiments are conducted using targets satisfying theoretical assumptions with high probability, thus narrowing the gap between theory and application. and 6 Experiments |
| Researcher Affiliation | Academia | Andrei Lupu School of Computer Science Mc Gill University andrei.lupu@mail.mcgill.ca Audrey Durand School of Computer Science Mc Gill University audrey.durand@mcgill.ca Doina Precup School of Computer Science Mc Gill University dprecup@cs.mcgill.ca |
| Pseudocode | Yes | Algorithm 1 Target-UCB for rewards in [0, 1]. |
| Open Source Code | Yes | We also provide an implementation of the Target-UCB algorithm: https://github.com/ lupuandr/Target-UCB. |
| Open Datasets | Yes | The resulting dataset can be found at: https://github.com/lupuandr/Target-UCB/tree/ master/Human%20bandit%20dataset. |
| Dataset Splits | No | The paper discusses different experimental settings (e.g., 2-actions problem, different α values, multi-agent settings) and uses Bernoulli reward distributions, but does not provide specific train/validation/test dataset splits. |
| Hardware Specification | No | The paper does not specify any hardware details such as GPU or CPU models, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper does not mention specific software dependencies or their version numbers (e.g., Python, PyTorch, TensorFlow versions) used in the experiments. |
| Experiment Setup | Yes | The following experiments evaluate the potential of Target UCB (C=2) in various settings. Bernoulli reward distributions are used in all experiments. Unless indicated otherwise, all results are obtained by averaging over 2000 independent runs. In all figures, shaded areas indicate one standard deviation above the mean. and The clique size is selected using Equation (6) for δ = 0.001. |