reproducibilityindex.ai

Learning "What-if" Explanations for Sequential Decision-Making

Authors: Ioana Bica, Daniel Jarrett, Alihan Hüyük, Mihaela van der Schaar

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through illustrative experiments in both real and simulated medical environments, we highlight the effectiveness of our batch, counterfactual inverse reinforcement learning approach in recovering accurate and interpretable descriptions of behavior.
Researcher Affiliation	Academia	Ioana Bica University of Oxford, Oxford, UK The Alan Turing Institute, London, UK ioana.bica@eng.ox.ac.uk Daniel Jarrett University of Cambridge, Cambridge, UK daniel.jarrett@maths.cam.ac.uk Alihan Hüyük University of Cambridge, Cambridge, UK ah2075@cam.ac.uk Mihaela van der Schaar University of Cambridge, Cambridge, UK Cambridge Center for AI in Medicine, UK University of California, Los Angeles, USA The Alan Turing Institute, London, UK mv472@cam.ac.uk
Pseudocode	Yes	Algorithm 1 (Batch, Max-Margin) CIRL and Algorithm 2 Counterfactual µ learning
Open Source Code	No	The paper does not contain any explicit statements about releasing source code for the described methodology, nor does it provide a link to a code repository. The phrase "open_source_code" is used in the schema but not found in the paper as a statement of availability of the paper's code.
Open Datasets	Yes	We also perform a case study on an ICU dataset from the MIMIC III database (Johnson et al., 2016).
Dataset Splits	Yes	For each simulated batch observational dataset, we use 9000 samples for training the Counterfactual Recurrent Network and 1000 for validation (hyperparameter optimization).
Hardware Specification	Yes	The experiments were run on a system with 2 NVIDIA K80 Tesla GPUs, 12CPUs, and 112GB of RAM.
Software Dependencies	No	The paper lists 'Adam' as an optimizer in several hyperparameter tables (e.g., Table 5, 6, 7) but does not provide specific version numbers for any software libraries or dependencies used in the implementation, such as Python, PyTorch, or TensorFlow versions.
Experiment Setup	Yes	Table 5: Hyperparameters for training µ-network for estimating feature expectations. Table 6: Hyperparameters used for training Q-network to ﬁnd optimal policy for a setting of the reward weights. Table 7: Hyperparameters used for training Q-network to solve the simulated environment. These tables list specific values for LSTM size, Batch size, Learning rate, Target network update M, Min ε, Max ε, ε decay, Number of training iterations, and Optimizer.