NovelD: A Simple yet Effective Exploration Criterion
Authors: Tianjun Zhang, Huazhe Xu, Xiaolong Wang, Yi Wu, Kurt Keutzer, Joseph E. Gonzalez, Yuandong Tian
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Novel D on three very challenging exploration environments: Mini Grid [13], Net Hack [34] and Atari games [9]. Novel D manages to solve all the static environments in Mini Grid within 120M environment steps, without any curriculum learning. In comparison, the previous SOTA only solves 50% of them. Novel D also achieves SOTA on multiple tasks in Net Hack, a rogue-like game that contains more challenging procedurally-generated environments. In multiple Atari games (e.g., Monte Zuma s Revenge, Venture, Gravitar), Novel D outperforms RND. |
| Researcher Affiliation | Collaboration | 1University of California, Berkeley 2Unveristy of California, San Diego 3Tsinghua University 4Facebook AI Research 5Shanghai Qi Zhi Institue |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github. com/tianjunz/Novel D. |
| Open Datasets | Yes | We evaluate Novel D on three very challenging exploration environments: Mini Grid [13], Net Hack [34] and Atari games [9]. |
| Dataset Splits | No | The paper does not provide specific train/validation/test dataset splits (e.g., percentages or counts). It mentions running experiments across four seeds and in 32 random initialized environments for Mini Grid, but not explicit data partitioning for validation. |
| Hardware Specification | No | The paper mentions 'computation limit' but does not provide specific hardware details such as GPU/CPU models, processors, or memory amounts used for running experiments. |
| Software Dependencies | No | The paper mentions using PPO as the base RL algorithm, along with CNN and RNN based networks, but it does not specify version numbers for any software libraries, frameworks, or dependencies. |
| Experiment Setup | Yes | Novel D manages to solve all the static environments in Mini Grid within 120M environment steps, without any curriculum learning. Results show that = 0.5 and β = 0 works the best. |