Learning "What-if" Explanations for Sequential Decision-Making
Authors: Ioana Bica, Daniel Jarrett, Alihan Hüyük, Mihaela van der Schaar
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through illustrative experiments in both real and simulated medical environments, we highlight the effectiveness of our batch, counterfactual inverse reinforcement learning approach in recovering accurate and interpretable descriptions of behavior. |
| Researcher Affiliation | Academia | Ioana Bica University of Oxford, Oxford, UK The Alan Turing Institute, London, UK ioana.bica@eng.ox.ac.uk Daniel Jarrett University of Cambridge, Cambridge, UK daniel.jarrett@maths.cam.ac.uk Alihan Hüyük University of Cambridge, Cambridge, UK ah2075@cam.ac.uk Mihaela van der Schaar University of Cambridge, Cambridge, UK Cambridge Center for AI in Medicine, UK University of California, Los Angeles, USA The Alan Turing Institute, London, UK mv472@cam.ac.uk |
| Pseudocode | Yes | Algorithm 1 (Batch, Max-Margin) CIRL and Algorithm 2 Counterfactual µ learning |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code for the described methodology, nor does it provide a link to a code repository. The phrase "open_source_code" is used in the schema but not found in the paper as a statement of availability of the paper's code. |
| Open Datasets | Yes | We also perform a case study on an ICU dataset from the MIMIC III database (Johnson et al., 2016). |
| Dataset Splits | Yes | For each simulated batch observational dataset, we use 9000 samples for training the Counterfactual Recurrent Network and 1000 for validation (hyperparameter optimization). |
| Hardware Specification | Yes | The experiments were run on a system with 2 NVIDIA K80 Tesla GPUs, 12CPUs, and 112GB of RAM. |
| Software Dependencies | No | The paper lists 'Adam' as an optimizer in several hyperparameter tables (e.g., Table 5, 6, 7) but does not provide specific version numbers for any software libraries or dependencies used in the implementation, such as Python, PyTorch, or TensorFlow versions. |
| Experiment Setup | Yes | Table 5: Hyperparameters for training µ-network for estimating feature expectations. Table 6: Hyperparameters used for training Q-network to find optimal policy for a setting of the reward weights. Table 7: Hyperparameters used for training Q-network to solve the simulated environment. These tables list specific values for LSTM size, Batch size, Learning rate, Target network update M, Min ε, Max ε, ε decay, Number of training iterations, and Optimizer. |