Transfer Learning in Multi-Armed Bandits: A Causal Approach
Authors: Junzhe Zhang, Elias Bareinboim
IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We formally prove that our strategy dominates previously known algorithms and achieves orders of magnitude faster convergence rates than these algorithms. Finally, we perform simulations and empirically demonstrate that our strategy is consistently more efficient than the current (non-causal) state-of-the-art methods. |
| Researcher Affiliation | Academia | Junzhe Zhang and Elias Bareinboim Purdue University, USA {zhang745,eb}@purdue.edu |
| Pseudocode | Yes | Algorithm 1: B-kl-UCB; Algorithm 2: B-TS for Bernoulli Bandits |
| Open Source Code | No | The paper does not provide any statement about releasing source code or a link to a code repository. |
| Open Datasets | No | The paper describes generating data for simulations ("5000 samples generated by a source agent") and uses "2-armed Bernoulli bandits" but does not specify a publicly available dataset or provide a link/citation to one. |
| Dataset Splits | No | The paper describes its simulation setup as "Simulations are partitioned into rounds of T = 5000 trials averaged over N = 200 repetitions" but does not specify train/validation/test splits for a dataset. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not specify any software names with version numbers for reproducibility (e.g., programming languages, libraries, frameworks). |
| Experiment Setup | Yes | Simulations are partitioned into rounds of T = 5000 trials averaged over N = 200 repetitions. For each task, we collect 5000 samples generated by a source agent and compute the empirical joint distribution. The causal bounds are estimated with the methods described in Sec. 4 from the empirical joint distributions. We assess each algorithm s performance with cumulative regrets (CR). Task 1. The expected rewards of the given parametrization are µ1 = 0.66, µ2 = 0.36, and the estimated causal bounds are b1 = [0.03, 0.76], b2 = [0.21, 0.51]. Task 2. The expected rewards of the given param. are µ1 = 0.58, µ2 = 0.74 and the estimated causal bounds are b1 = [0.48, 0.61], b2 = [0.7, 0.83]. Task 3. The expected rewards are µ1 = 0.2, µ2 = 0.4 and the estimated causal bounds are b1 = b2 = [0, 0.61]. |