NovelD: A Simple yet Effective Exploration Criterion

Authors: Tianjun Zhang, Huazhe Xu, Xiaolong Wang, Yi Wu, Kurt Keutzer, Joseph E. Gonzalez, Yuandong Tian

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate Novel D on three very challenging exploration environments: Mini Grid [13], Net Hack [34] and Atari games [9]. Novel D manages to solve all the static environments in Mini Grid within 120M environment steps, without any curriculum learning. In comparison, the previous SOTA only solves 50% of them. Novel D also achieves SOTA on multiple tasks in Net Hack, a rogue-like game that contains more challenging procedurally-generated environments. In multiple Atari games (e.g., Monte Zuma s Revenge, Venture, Gravitar), Novel D outperforms RND.
Researcher Affiliation Collaboration 1University of California, Berkeley 2Unveristy of California, San Diego 3Tsinghua University 4Facebook AI Research 5Shanghai Qi Zhi Institue
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github. com/tianjunz/Novel D.
Open Datasets Yes We evaluate Novel D on three very challenging exploration environments: Mini Grid [13], Net Hack [34] and Atari games [9].
Dataset Splits No The paper does not provide specific train/validation/test dataset splits (e.g., percentages or counts). It mentions running experiments across four seeds and in 32 random initialized environments for Mini Grid, but not explicit data partitioning for validation.
Hardware Specification No The paper mentions 'computation limit' but does not provide specific hardware details such as GPU/CPU models, processors, or memory amounts used for running experiments.
Software Dependencies No The paper mentions using PPO as the base RL algorithm, along with CNN and RNN based networks, but it does not specify version numbers for any software libraries, frameworks, or dependencies.
Experiment Setup Yes Novel D manages to solve all the static environments in Mini Grid within 120M environment steps, without any curriculum learning. Results show that = 0.5 and β = 0 works the best.