reproducibilityindex.ai

Causal Deep Reinforcement Learning Using Observational Data

Authors: Wenxuan Zhu, Chao Yu, Qiang Zhang

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We prove the effectiveness of our deconfounding methods and validate them experimentally.The experimental results verify that the proposed deconfounding methods are effective: offline RL algorithms using deconfounding methods perform better on datasets with the confounders.
Researcher Affiliation	Academia	1Dalian University of Technology 2Sun Yat-sen University
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper states: "All the implementations of the offline RL algorithms in this paper follow d3rlpy, an offline RL library [Takuma Seno, 2021]." This refers to a third-party library used, not the authors' own code for their proposed methods. No explicit statement or link for the authors' code is provided.
Open Datasets	Yes	we design four benchmark tasks, namely, Emotional Pendulum, Windy Pendulum, Emotional Pendulum, and Windy Pendulum, by modifying the Pendulum task in the Open AI Gym [Brockman et al., 2016].
Dataset Splits	No	The paper states: "The rewards are tested over 20 episodes every 1000 learning steps, and averaged over 5 random seeds." While it mentions testing, it does not provide specific training, validation, or test dataset split percentages or sample counts to reproduce the data partitioning.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions using "d3rlpy, an offline RL library [Takuma Seno, 2021]" but does not specify the version number of d3rlpy or any other key software components like Python, PyTorch/TensorFlow, or CUDA versions.
Experiment Setup	Yes	All the hyperparameters of the offline RL algorithms are set to the default values of d3rlpy. The rewards are tested over 20 episodes every 1000 learning steps, and averaged over 5 random seeds. Other hyperparameters and the implementation details are described in Appendix C.