reproducibilityindex.ai

Inverse Contextual Bandits: Learning How Behavior Evolves over Time

Authors: Alihan Hüyük, Daniel Jarrett, Mihaela van der Schaar

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, using both real and simulated data for liver transplantations, we illustrate the applicability and explainability of our method, as well as benchmarking and validating its accuracy.
Researcher Affiliation	Academia	1Department of Applied Mathematics and Theoretical Physics, University of Cambridge, UK 2Department of Electrical Engineering, University of California, Los Angeles, USA.
Pseudocode	Yes	Algorithm 1 (Bayesian ICB) summarizes the overall procedure.Algorithm 2 (Nonparametric Bayesian ICB) summarizes the overall sampling procedure.
Open Source Code	Yes	Code to replicate our main results is made available at https://github.com/alihanhyk/invconban and https://github.com/vanderschaarlab/invconban.
Open Datasets	Yes	Decision Environments We consider data from the Organ Procurement & Transplantation Network ( OPTN ) as of Dec. 4, 2020, which consists of patients registered for liver transplantation from 1995 to 2020 [62].
Dataset Splits	No	The paper describes the data sources and sampling procedures but does not explicitly provide specific train/validation/test dataset splits (e.g., percentages or counts) or cross-validation details for reproducing the experiment.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper does not explicitly list specific software dependencies with version numbers (e.g., 'Python 3.8', 'PyTorch 1.9').
Experiment Setup	Yes	All agents select actions stochastically as described in (3) with α = 20.Sampling: We set σ = 0.10.B-ICB: We have set σ = 0.10, α = 20, and N = 1000 (with an additional 1000 samples as burn-in). When taking gradient steps, we have used the RMSprop optimizer with a learning rate of 0.001 and a discount factor of 0.9. We have run our algorithm for 100 iterations.NB-ICB: We have set ΣP = 5 10 4 I and ΣB = 5 10 5 I. We have taken 1,000 samples from P(β1:T \|D) with an interval of 10 iterations between each sample after 10,000 burn-in iterations (i.e., N = 20,000).