Inverse Contextual Bandits: Learning How Behavior Evolves over Time

Authors: Alihan Hüyük, Daniel Jarrett, Mihaela van der Schaar

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, using both real and simulated data for liver transplantations, we illustrate the applicability and explainability of our method, as well as benchmarking and validating its accuracy.
Researcher Affiliation Academia 1Department of Applied Mathematics and Theoretical Physics, University of Cambridge, UK 2Department of Electrical Engineering, University of California, Los Angeles, USA.
Pseudocode Yes Algorithm 1 (Bayesian ICB) summarizes the overall procedure.Algorithm 2 (Nonparametric Bayesian ICB) summarizes the overall sampling procedure.
Open Source Code Yes Code to replicate our main results is made available at https://github.com/alihanhyk/invconban and https://github.com/vanderschaarlab/invconban.
Open Datasets Yes Decision Environments We consider data from the Organ Procurement & Transplantation Network ( OPTN ) as of Dec. 4, 2020, which consists of patients registered for liver transplantation from 1995 to 2020 [62].
Dataset Splits No The paper describes the data sources and sampling procedures but does not explicitly provide specific train/validation/test dataset splits (e.g., percentages or counts) or cross-validation details for reproducing the experiment.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments.
Software Dependencies No The paper does not explicitly list specific software dependencies with version numbers (e.g., 'Python 3.8', 'PyTorch 1.9').
Experiment Setup Yes All agents select actions stochastically as described in (3) with α = 20.Sampling: We set σ = 0.10.B-ICB: We have set σ = 0.10, α = 20, and N = 1000 (with an additional 1000 samples as burn-in). When taking gradient steps, we have used the RMSprop optimizer with a learning rate of 0.001 and a discount factor of 0.9. We have run our algorithm for 100 iterations.NB-ICB: We have set ΣP = 5 10 4 I and ΣB = 5 10 5 I. We have taken 1,000 samples from P(β1:T |D) with an interval of 10 iterations between each sample after 10,000 burn-in iterations (i.e., N = 20,000).