reproducibilityindex.ai

Transfer Learning in Multi-Armed Bandits: A Causal Approach

Authors: Junzhe Zhang, Elias Bareinboim

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We formally prove that our strategy dominates previously known algorithms and achieves orders of magnitude faster convergence rates than these algorithms. Finally, we perform simulations and empirically demonstrate that our strategy is consistently more efﬁcient than the current (non-causal) state-of-the-art methods.
Researcher Affiliation	Academia	Junzhe Zhang and Elias Bareinboim Purdue University, USA {zhang745,eb}@purdue.edu
Pseudocode	Yes	Algorithm 1: B-kl-UCB; Algorithm 2: B-TS for Bernoulli Bandits
Open Source Code	No	The paper does not provide any statement about releasing source code or a link to a code repository.
Open Datasets	No	The paper describes generating data for simulations ("5000 samples generated by a source agent") and uses "2-armed Bernoulli bandits" but does not specify a publicly available dataset or provide a link/citation to one.
Dataset Splits	No	The paper describes its simulation setup as "Simulations are partitioned into rounds of T = 5000 trials averaged over N = 200 repetitions" but does not specify train/validation/test splits for a dataset.
Hardware Specification	No	The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper does not specify any software names with version numbers for reproducibility (e.g., programming languages, libraries, frameworks).
Experiment Setup	Yes	Simulations are partitioned into rounds of T = 5000 trials averaged over N = 200 repetitions. For each task, we collect 5000 samples generated by a source agent and compute the empirical joint distribution. The causal bounds are estimated with the methods described in Sec. 4 from the empirical joint distributions. We assess each algorithm s performance with cumulative regrets (CR). Task 1. The expected rewards of the given parametrization are µ1 = 0.66, µ2 = 0.36, and the estimated causal bounds are b1 = [0.03, 0.76], b2 = [0.21, 0.51]. Task 2. The expected rewards of the given param. are µ1 = 0.58, µ2 = 0.74 and the estimated causal bounds are b1 = [0.48, 0.61], b2 = [0.7, 0.83]. Task 3. The expected rewards are µ1 = 0.2, µ2 = 0.4 and the estimated causal bounds are b1 = b2 = [0, 0.61].