Causal Deep Reinforcement Learning Using Observational Data
Authors: Wenxuan Zhu, Chao Yu, Qiang Zhang
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We prove the effectiveness of our deconfounding methods and validate them experimentally.The experimental results verify that the proposed deconfounding methods are effective: offline RL algorithms using deconfounding methods perform better on datasets with the confounders. |
| Researcher Affiliation | Academia | 1Dalian University of Technology 2Sun Yat-sen University |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states: "All the implementations of the offline RL algorithms in this paper follow d3rlpy, an offline RL library [Takuma Seno, 2021]." This refers to a third-party library used, not the authors' own code for their proposed methods. No explicit statement or link for the authors' code is provided. |
| Open Datasets | Yes | we design four benchmark tasks, namely, Emotional Pendulum, Windy Pendulum, Emotional Pendulum*, and Windy Pendulum*, by modifying the Pendulum task in the Open AI Gym [Brockman et al., 2016]. |
| Dataset Splits | No | The paper states: "The rewards are tested over 20 episodes every 1000 learning steps, and averaged over 5 random seeds." While it mentions testing, it does not provide specific training, validation, or test dataset split percentages or sample counts to reproduce the data partitioning. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using "d3rlpy, an offline RL library [Takuma Seno, 2021]" but does not specify the version number of d3rlpy or any other key software components like Python, PyTorch/TensorFlow, or CUDA versions. |
| Experiment Setup | Yes | All the hyperparameters of the offline RL algorithms are set to the default values of d3rlpy. The rewards are tested over 20 episodes every 1000 learning steps, and averaged over 5 random seeds. Other hyperparameters and the implementation details are described in Appendix C. |