reproducibilityindex.ai

PAE: Reinforcement Learning from External Knowledge for Efficient Exploration

Authors: Zhe Wu, Haofei Lu, Junliang Xing, You Wu, Renye Yan, Yaozhong Gan, Yuanchun Shi

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive experiments across 11 challenging tasks from the Baby AI and Mini Hack environment suites demonstrate PAE s superior exploration efficiency with good interpretability.
Researcher Affiliation	Academia	Zhe Wu 1, Haofei Lu 2, Junliang Xing 2, You Wu1,3, Renye Yan1,4, Yaozhong Gan 1, Yuanchun Shi1,2 1Qi Yuan Lab, 2Department of Computer Science and Technology, Tsinghua University 3Nanjing University, 4Peking University
Pseudocode	Yes	Algorithm 1 in Appendix A.4.1 summarizes our PAE procedure.
Open Source Code	No	The paper states 'We implemented the PAE and reproduction baseline algorithms using Torch Beast2 (K uttler et al., 2019)' and provides a footnote link to the TorchBeast GitHub repository (https://github.com/facebookresearch/torchbeast). However, this is a link to the third-party framework used, not to the authors' specific implementation code for PAE.
Open Datasets	Yes	We evaluated our method across two task types, totaling six environments within the Baby AI environment: Key Corridor tasks (KEYCORRS3R3, KEYCORRS4R3, KEYCORRS5R3) and Obstructed Maze tasks (OBSTRMAZE1DL, OBSTRMAZE2DLHB, OBSTRMAZE1Q). ...we extended our PAE approach to the more challenging Mini Hack tasks. Mini Hack consists of procedurally generated tasks... Our comparisons of PAE with baseline methods encompassed five Mini Hack environments: LAVACROSS-RING, LAVACROSS-POTION, LAVACROSS-FULL, RIVER-MONSTER, and MULTIROOM-N4-MONSTER. See Appendix A.2 for more details.
Dataset Splits	No	The paper does not explicitly provide specific percentages or counts for training, validation, and test dataset splits. It describes the environments used for evaluation but not the data partitioning methodology in detail.
Hardware Specification	Yes	Each model was trained using five independent seeds on a system with 112 Intel Xeon Platinum 8280 cores and 6 Nvidia RTX 3090 GPUs.
Software Dependencies	Yes	We implemented the PAE and reproduction baseline algorithms using Torch Beast2 (K uttler et al., 2019)... Meanwhile, the Planner utilized a pre-trained BERT model with frozen parameters to understand the semantics and encode knowledge.
Experiment Setup	Yes	For PAE, we ran a grid search over batch size {8, 32, 150}, unroll length {20, 40, 100, 200}, entropy cost for the Actor {0.0001, 0.0005, 0.001}, the Actor learning rate {0.0001, 0.0005, 0.001}, the Planner learning rate {0.0001, 0.0005, 0.001}, entropy cost for the Planner {0.001, 0.005, 0.01}. Table 5 shows the best parameters obtained from the search.