reproducibilityindex.ai

Causality-driven Hierarchical Structure Discovery for Reinforcement Learning

Authors: shaohui peng, Xing Hu, Rui Zhang, Ke Tang, Jiaming Guo, Qi Yi, Ruizhi Chen, xishan zhang, Zidong Du, Ling Li, Qi Guo, Yunji Chen

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The results in two complex environments, 2D-Minecraft and Eden, show that CDHRL signiﬁcantly boosts exploration efﬁciency with the causality-driven paradigm. We veriﬁed our method in two typical complex tasks, including 2d-Minecraft [30] and simpliﬁed sandbox survival games Eden [4]. The results show that CDHRL discovers high-quality hierarchical structures and signiﬁcantly enhances exploration efﬁciency.
Researcher Affiliation	Collaboration	1SKL of Processors, Institute of Computing Technology, CAS 2University of Chinese Academy of Sciences 3Cambricon Technologies 4Department of Computer Science and Engineering, Southern University of Science and Technology 5SKL of Computer Science, Institute of Software, CAS 6University of Science and Technology of China
Pseudocode	Yes	(The pseudo-code of the framework is in Appendix B.1)
Open Source Code	Yes	We offer the code of our method in the supplemental material.
Open Datasets	Yes	We veriﬁed our method in two typical complex tasks, including 2d-Minecraft [30] and simpliﬁed sandbox survival games Eden [4]
Dataset Splits	No	The paper mentions training stages (pre-training and adaptation) but does not provide specific percentages or counts for training, validation, or test dataset splits.
Hardware Specification	No	The paper explicitly states: 'Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [No]'
Software Dependencies	No	The paper mentions implementing the agent based on multi-level DQN with HER and that SCM is implemented like SDI, but it does not specify software versions for libraries, frameworks, or programming languages.
Experiment Setup	Yes	The training is divided into two stages, pre-training and adaptation. In the pre-training stage, we train multi-level subgoal-based policies until no new causality is discovered. In the adaptation stage, we train an upper controller on the pre-trained subgoals to maximize the task reward. The SCM is implemented like SDI [16]. We set the initial subgoal distribution as the uniform distribution on GEV GS and pre-trains subgoals with enough steps. To verify whether newly associated variables are controllable as supposed, we compare the ﬁnal training success rates of the new subgoals with a preset threshold φcausal before adding them to the subgoal hierarchy.