Inverse Contextual Bandits: Learning How Behavior Evolves over Time
Authors: Alihan Hüyük, Daniel Jarrett, Mihaela van der Schaar
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, using both real and simulated data for liver transplantations, we illustrate the applicability and explainability of our method, as well as benchmarking and validating its accuracy. |
| Researcher Affiliation | Academia | 1Department of Applied Mathematics and Theoretical Physics, University of Cambridge, UK 2Department of Electrical Engineering, University of California, Los Angeles, USA. |
| Pseudocode | Yes | Algorithm 1 (Bayesian ICB) summarizes the overall procedure.Algorithm 2 (Nonparametric Bayesian ICB) summarizes the overall sampling procedure. |
| Open Source Code | Yes | Code to replicate our main results is made available at https://github.com/alihanhyk/invconban and https://github.com/vanderschaarlab/invconban. |
| Open Datasets | Yes | Decision Environments We consider data from the Organ Procurement & Transplantation Network ( OPTN ) as of Dec. 4, 2020, which consists of patients registered for liver transplantation from 1995 to 2020 [62]. |
| Dataset Splits | No | The paper describes the data sources and sampling procedures but does not explicitly provide specific train/validation/test dataset splits (e.g., percentages or counts) or cross-validation details for reproducing the experiment. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not explicitly list specific software dependencies with version numbers (e.g., 'Python 3.8', 'PyTorch 1.9'). |
| Experiment Setup | Yes | All agents select actions stochastically as described in (3) with α = 20.Sampling: We set σ = 0.10.B-ICB: We have set σ = 0.10, α = 20, and N = 1000 (with an additional 1000 samples as burn-in). When taking gradient steps, we have used the RMSprop optimizer with a learning rate of 0.001 and a discount factor of 0.9. We have run our algorithm for 100 iterations.NB-ICB: We have set ΣP = 5 10 4 I and ΣB = 5 10 5 I. We have taken 1,000 samples from P(β1:T |D) with an interval of 10 iterations between each sample after 10,000 burn-in iterations (i.e., N = 20,000). |