reproducibilityindex.ai

Fast Counterfactual Inference for History-Based Reinforcement Learning

Authors: Haichuan Gao, Tianren Zhang, Zhile Yang, Yuqing Guo, Jinsheng Ren, Shangqi Guo, Feng Chen

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we evaluate T-HCI on various RL tasks with partial observability. Our experiments are designed to answer the following three questions: 1) Can T-HCI improve the sample efficiency of RL methods? 2) Is the computational overhead of T-HCI acceptable in practice? 3) Can T-HCI mine observations with causal effects? Figure 5 shows that T-HCI achieves the best sample efficiency in every sub-task.
Researcher Affiliation	Academia	Haichuan Gao1, Tianren Zhang1, Zhile Yang2, Yuqing Guo1, Jinsheng Ren1, Shangqi Guo1,3*, Feng Chen1,4 1Department of Automation, Tsinghua University, Beijing, China 2School of Computing, University of Leeds, Leeds, UK 3Department of Precision Instrument, Tsinghua University, Beijing, China 4LSBDPA Beijing Key Laboratory, Beijing, China
Pseudocode	No	The paper states 'its details are shown in Appendix A.' regarding the T-HCI algorithm. However, Appendix A is not provided in the main paper text, and no pseudocode or algorithm block is present in the main body of the paper.
Open Source Code	No	The paper does not provide any explicit statement or link indicating the release of open-source code for the described methodology.
Open Datasets	Yes	Three popular types of tasks are used to evaluate T-HCI s effectiveness to adjust for confounding: Maze, Baby AI, and Jigsaw puzzle. Maze and Baby AI tasks are commonly used as grid-like partially-observable tests (Oh et al. 2016; Loynd et al. 2020; Chevalier-Boisvert et al. 2019). ... As shown in Figure 4, we focus on 3D Jigsaw puzzle with continuous observation spaces built on Coppeliasim (Rohmer, Singh, and Freese 2013; Bogaerts et al. 2020; Gao et al. 2022).
Dataset Splits	No	The paper does not provide specific details on training, validation, and test dataset splits with percentages or sample counts.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments.
Software Dependencies	No	The paper mentions software like 'Coppeliasim' and algorithms like 'LSTM', 'DMC', 'A2C', and 'PPO', but it does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup	No	The paper refers to Appendix E for 'More details of the baselines and parameter settings' and Appendix A.2 for 'detailed observation discretization techniques', and Appendix A.1 for 'loss functions'. However, these appendices are not provided in the main text, and the main body of the paper does not contain specific experimental setup details such as hyperparameter values or training configurations.