Learning "What-if" Explanations for Sequential Decision-Making

Authors: Ioana Bica, Daniel Jarrett, Alihan Hüyük, Mihaela van der Schaar

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through illustrative experiments in both real and simulated medical environments, we highlight the effectiveness of our batch, counterfactual inverse reinforcement learning approach in recovering accurate and interpretable descriptions of behavior.
Researcher Affiliation Academia Ioana Bica University of Oxford, Oxford, UK The Alan Turing Institute, London, UK ioana.bica@eng.ox.ac.uk Daniel Jarrett University of Cambridge, Cambridge, UK daniel.jarrett@maths.cam.ac.uk Alihan Hüyük University of Cambridge, Cambridge, UK ah2075@cam.ac.uk Mihaela van der Schaar University of Cambridge, Cambridge, UK Cambridge Center for AI in Medicine, UK University of California, Los Angeles, USA The Alan Turing Institute, London, UK mv472@cam.ac.uk
Pseudocode Yes Algorithm 1 (Batch, Max-Margin) CIRL and Algorithm 2 Counterfactual µ learning
Open Source Code No The paper does not contain any explicit statements about releasing source code for the described methodology, nor does it provide a link to a code repository. The phrase "open_source_code" is used in the schema but not found in the paper as a statement of availability of the paper's code.
Open Datasets Yes We also perform a case study on an ICU dataset from the MIMIC III database (Johnson et al., 2016).
Dataset Splits Yes For each simulated batch observational dataset, we use 9000 samples for training the Counterfactual Recurrent Network and 1000 for validation (hyperparameter optimization).
Hardware Specification Yes The experiments were run on a system with 2 NVIDIA K80 Tesla GPUs, 12CPUs, and 112GB of RAM.
Software Dependencies No The paper lists 'Adam' as an optimizer in several hyperparameter tables (e.g., Table 5, 6, 7) but does not provide specific version numbers for any software libraries or dependencies used in the implementation, such as Python, PyTorch, or TensorFlow versions.
Experiment Setup Yes Table 5: Hyperparameters for training µ-network for estimating feature expectations. Table 6: Hyperparameters used for training Q-network to find optimal policy for a setting of the reward weights. Table 7: Hyperparameters used for training Q-network to solve the simulated environment. These tables list specific values for LSTM size, Batch size, Learning rate, Target network update M, Min ε, Max ε, ε decay, Number of training iterations, and Optimizer.