Transportability for Bandits with Data from Different Environments

Authors: Alexis Bellot, Alan Malek, Silvia Chiappa

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the proposed approach on several synthetic scenarios inspired by the literature on clinical trials and advertising. We compare Thompson sampling with additional data sources (t TS, Alg. 1) with Thompson sampling with uninformative priors (TS) [38], a KL-UCB [9] algorithm with uninformative priors (UCB), and as a baseline also include the algorithm that chooses actions uniformly at random (Uniform)9. For all algorithms, we measure their regrets RT , averaged over 10 repetitions.
Researcher Affiliation Industry Alexis Bellot, Alan Malek, Silvia Chiappa Google Deep Mind London, UK abellot@google.com
Pseudocode Yes Algorithm 1 Thompson Sampling with Transportability (t TS) Input: Selection diagrams t G , a, G ,b, . . . u, prior data v : p va, vb, . . . q, decision variable X, reward variable Y , horizon T. for rounds t 1, 2, . . . , T do Approximate P pξ, θ | v, vxp1q, . . . , vxpt 1qq Sample ξptq, θptq P pξ, θ | v, vxp1q, . . . , vxpt 1qq xptq Ð arg maxx EP Yx | ξptq, θptq Take action xptq and observe vxptq in π end for
Open Source Code No No mention of code availability or repository links for the described methodology.
Open Datasets No We evaluate the proposed approach on several synthetic scenarios inspired by the literature on clinical trials and advertising.
Dataset Splits No Specifically, with this model, 1000 prior data samples are given from an environment πa that differs in the causal assignment of Z in comparison with the deployment environment π . (This describes the source of data, not specific training/validation/test splits).
Hardware Specification No No specific hardware details are mentioned for running the experiments.
Software Dependencies No No software names with version numbers are provided.
Experiment Setup No Details on all data generating mechanisms and a discussion on mis-specification and limitations of the proposed approach can be found in Appendix D and Appendix B, respectively. (These provide context for the experiments but lack specific hyperparameter values for the algorithms themselves).