reproducibilityindex.ai

Explainable Reinforcement Learning via a Causal World Model

Authors: Zhongwei Yu, Jingqing Ruan, Dengpeng Xing

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present examples of causal chains in two representative environments: Lunarlander-Continuous for the continuous action space, and Build-Maine for the discrete action space. To verify whether our approach can produce correct causal chains, we design an environment to measure the accuracy of recovering causal dependencies of the ground-truth AIM. To evaluate the performance of our model in MBRL, we perform experiments in two extra environments: Cartpole and Lunarlander-Discrete.
Researcher Affiliation	Academia	Institute of Automation, Chinese Academy of Sciences {yuzhongwei2021, ruanjingqing2019, dengpeng.xing}@ia.ac.cn
Pseudocode	Yes	The pseudo-code of the learning procedure is given in Appendix D.
Open Source Code	Yes	Our source code is available at https://github.com/Ease Onway/Explainable Causal-Reinforcement-Learning.
Open Datasets	Yes	The Build-Marine environment is adapted from one of the Start Craft II mini-games in SC2LE [Samvelyan et al., 2019]; the Cartpole and Lunarlander environments are classic control problems provided by Open AI Gym [Brockman et al., 2016].
Dataset Splits	No	The paper describes collecting transition data into a buffer for training and updating the model but does not specify exact train/validation/test dataset splits or percentages.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions using "Proximal Policy Optimization" and refers to a model that is "learned using PyTorch" in Appendix E.2, but it does not specify version numbers for PyTorch or any other software dependencies.
Experiment Setup	Yes	We first use the policy (with noise) to collect 150k samples into the buffer D. Then, we use these samples to discover the causal graph (with the threshold η = 0.05) and train the inference networks.