NetHack is Hard to Hack

Authors: Ulyana Piterbarg, Lerrel Pinto, Rob Fergus

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we delve into the reasons behind this performance gap and present an extensive study on neural policy learning for Net Hack. In this work, we conduct a comprehensive study of Net Hack and examine various learning mechanisms to enhance the performance of neural models. Our main findings are as follows:
Researcher Affiliation Academia Ulyana Piterbarg NYU Lerrel Pinto NYU Rob Fergus NYU
Pseudocode No No pseudocode or clearly labeled algorithm blocks were found.
Open Source Code Yes Additionally, we open-source our code, models, and the Hi Hack repository2, which includes (i) our 109 dataset of hierarchical labels obtained from Auto Ascend and (ii) the augmented Auto Ascend and NLE code employed for hierarchical data generation, encouraging development. 2Code is available at https://github.com/upiterbarg/hihack.
Open Datasets Yes Our goal in generating the Hi Hack Dataset (Hi Hack) is to create a hierarchically-informed analogue of the large-scale Auto Ascend demonstration corpus of NLD, NLD-AA. Additionally, we open-source our code, models, and the Hi Hack repository2, which includes (i) our 109 dataset of hierarchical labels obtained from Auto Ascend and (ii) the augmented Auto Ascend and NLE code employed for hierarchical data generation, encouraging development. 2Code is available at https://github.com/upiterbarg/hihack.
Dataset Splits No No explicit training/validation/test dataset splits with specific percentages or counts for a distinct validation set were provided. The paper mentions training on the Hi Hack Dataset and evaluating on 'withheld NLE instances' or 'rolling NLE score proxy metric' during training, but a formal validation split is not defined.
Hardware Specification Yes Experiments were run on compute nodes on a private high-performance computing (HPC) cluster equipped either with a NVIDIA RTX-8000 or NVIDIA A100 GPU, as well as 16 CPU cores.
Software Dependencies No The Py Torch library was used for to specify all models, loss functions, and optimizers [43]. All models were trained with the Adam optimizer [31] and a fixed learning rate. No specific version numbers for software dependencies were found.
Experiment Setup Yes All relevant training hyperparameter values, across model families as well as BC vs APPO + BC experiment variants, are displayed in Table 3.