Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Leveraging Observations in Bandits: Between Risks and Benefits

Authors: Andrei Lupu, Audrey Durand, Doina Precup6112-6119

AAAI 2019 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide empirical results showing both great benefits as well as certain limitations inherent to observational learning in the multi-armed bandit setting. Experiments are conducted using targets satisfying theoretical assumptions with high probability, thus narrowing the gap between theory and application. and 6 Experiments
Researcher Affiliation Academia Andrei Lupu School of Computer Science Mc Gill University EMAIL Audrey Durand School of Computer Science Mc Gill University EMAIL Doina Precup School of Computer Science Mc Gill University EMAIL
Pseudocode Yes Algorithm 1 Target-UCB for rewards in [0, 1].
Open Source Code Yes We also provide an implementation of the Target-UCB algorithm: https://github.com/ lupuandr/Target-UCB.
Open Datasets Yes The resulting dataset can be found at: https://github.com/lupuandr/Target-UCB/tree/ master/Human%20bandit%20dataset.
Dataset Splits No The paper discusses different experimental settings (e.g., 2-actions problem, different α values, multi-agent settings) and uses Bernoulli reward distributions, but does not provide specific train/validation/test dataset splits.
Hardware Specification No The paper does not specify any hardware details such as GPU or CPU models, or memory specifications used for running the experiments.
Software Dependencies No The paper does not mention specific software dependencies or their version numbers (e.g., Python, PyTorch, TensorFlow versions) used in the experiments.
Experiment Setup Yes The following experiments evaluate the potential of Target UCB (C=2) in various settings. Bernoulli reward distributions are used in all experiments. Unless indicated otherwise, all results are obtained by averaging over 2000 independent runs. In all figures, shaded areas indicate one standard deviation above the mean. and The clique size is selected using Equation (6) for δ = 0.001.