Fast Counterfactual Inference for History-Based Reinforcement Learning
Authors: Haichuan Gao, Tianren Zhang, Zhile Yang, Yuqing Guo, Jinsheng Ren, Shangqi Guo, Feng Chen
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we evaluate T-HCI on various RL tasks with partial observability. Our experiments are designed to answer the following three questions: 1) Can T-HCI improve the sample efficiency of RL methods? 2) Is the computational overhead of T-HCI acceptable in practice? 3) Can T-HCI mine observations with causal effects? Figure 5 shows that T-HCI achieves the best sample efficiency in every sub-task. |
| Researcher Affiliation | Academia | Haichuan Gao1, Tianren Zhang1, Zhile Yang2, Yuqing Guo1, Jinsheng Ren1, Shangqi Guo1,3*, Feng Chen1,4 1Department of Automation, Tsinghua University, Beijing, China 2School of Computing, University of Leeds, Leeds, UK 3Department of Precision Instrument, Tsinghua University, Beijing, China 4LSBDPA Beijing Key Laboratory, Beijing, China |
| Pseudocode | No | The paper states 'its details are shown in Appendix A.' regarding the T-HCI algorithm. However, Appendix A is not provided in the main paper text, and no pseudocode or algorithm block is present in the main body of the paper. |
| Open Source Code | No | The paper does not provide any explicit statement or link indicating the release of open-source code for the described methodology. |
| Open Datasets | Yes | Three popular types of tasks are used to evaluate T-HCI s effectiveness to adjust for confounding: Maze, Baby AI, and Jigsaw puzzle. Maze and Baby AI tasks are commonly used as grid-like partially-observable tests (Oh et al. 2016; Loynd et al. 2020; Chevalier-Boisvert et al. 2019). ... As shown in Figure 4, we focus on 3D Jigsaw puzzle with continuous observation spaces built on Coppeliasim (Rohmer, Singh, and Freese 2013; Bogaerts et al. 2020; Gao et al. 2022). |
| Dataset Splits | No | The paper does not provide specific details on training, validation, and test dataset splits with percentages or sample counts. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments. |
| Software Dependencies | No | The paper mentions software like 'Coppeliasim' and algorithms like 'LSTM', 'DMC', 'A2C', and 'PPO', but it does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | No | The paper refers to Appendix E for 'More details of the baselines and parameter settings' and Appendix A.2 for 'detailed observation discretization techniques', and Appendix A.1 for 'loss functions'. However, these appendices are not provided in the main text, and the main body of the paper does not contain specific experimental setup details such as hyperparameter values or training configurations. |