Transfer Learning in Multi-Armed Bandits: A Causal Approach

Authors: Junzhe Zhang, Elias Bareinboim

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We formally prove that our strategy dominates previously known algorithms and achieves orders of magnitude faster convergence rates than these algorithms. Finally, we perform simulations and empirically demonstrate that our strategy is consistently more efficient than the current (non-causal) state-of-the-art methods.
Researcher Affiliation Academia Junzhe Zhang and Elias Bareinboim Purdue University, USA {zhang745,eb}@purdue.edu
Pseudocode Yes Algorithm 1: B-kl-UCB; Algorithm 2: B-TS for Bernoulli Bandits
Open Source Code No The paper does not provide any statement about releasing source code or a link to a code repository.
Open Datasets No The paper describes generating data for simulations ("5000 samples generated by a source agent") and uses "2-armed Bernoulli bandits" but does not specify a publicly available dataset or provide a link/citation to one.
Dataset Splits No The paper describes its simulation setup as "Simulations are partitioned into rounds of T = 5000 trials averaged over N = 200 repetitions" but does not specify train/validation/test splits for a dataset.
Hardware Specification No The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper does not specify any software names with version numbers for reproducibility (e.g., programming languages, libraries, frameworks).
Experiment Setup Yes Simulations are partitioned into rounds of T = 5000 trials averaged over N = 200 repetitions. For each task, we collect 5000 samples generated by a source agent and compute the empirical joint distribution. The causal bounds are estimated with the methods described in Sec. 4 from the empirical joint distributions. We assess each algorithm s performance with cumulative regrets (CR). Task 1. The expected rewards of the given parametrization are µ1 = 0.66, µ2 = 0.36, and the estimated causal bounds are b1 = [0.03, 0.76], b2 = [0.21, 0.51]. Task 2. The expected rewards of the given param. are µ1 = 0.58, µ2 = 0.74 and the estimated causal bounds are b1 = [0.48, 0.61], b2 = [0.7, 0.83]. Task 3. The expected rewards are µ1 = 0.2, µ2 = 0.4 and the estimated causal bounds are b1 = b2 = [0, 0.61].