reproducibilityindex.ai

NetHack is Hard to Hack

Authors: Ulyana Piterbarg, Lerrel Pinto, Rob Fergus

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we delve into the reasons behind this performance gap and present an extensive study on neural policy learning for Net Hack. In this work, we conduct a comprehensive study of Net Hack and examine various learning mechanisms to enhance the performance of neural models. Our main findings are as follows:
Researcher Affiliation	Academia	Ulyana Piterbarg NYU Lerrel Pinto NYU Rob Fergus NYU
Pseudocode	No	No pseudocode or clearly labeled algorithm blocks were found.
Open Source Code	Yes	Additionally, we open-source our code, models, and the Hi Hack repository2, which includes (i) our 109 dataset of hierarchical labels obtained from Auto Ascend and (ii) the augmented Auto Ascend and NLE code employed for hierarchical data generation, encouraging development. 2Code is available at https://github.com/upiterbarg/hihack.
Open Datasets	Yes	Our goal in generating the Hi Hack Dataset (Hi Hack) is to create a hierarchically-informed analogue of the large-scale Auto Ascend demonstration corpus of NLD, NLD-AA. Additionally, we open-source our code, models, and the Hi Hack repository2, which includes (i) our 109 dataset of hierarchical labels obtained from Auto Ascend and (ii) the augmented Auto Ascend and NLE code employed for hierarchical data generation, encouraging development. 2Code is available at https://github.com/upiterbarg/hihack.
Dataset Splits	No	No explicit training/validation/test dataset splits with specific percentages or counts for a distinct validation set were provided. The paper mentions training on the Hi Hack Dataset and evaluating on 'withheld NLE instances' or 'rolling NLE score proxy metric' during training, but a formal validation split is not defined.
Hardware Specification	Yes	Experiments were run on compute nodes on a private high-performance computing (HPC) cluster equipped either with a NVIDIA RTX-8000 or NVIDIA A100 GPU, as well as 16 CPU cores.
Software Dependencies	No	The Py Torch library was used for to specify all models, loss functions, and optimizers [43]. All models were trained with the Adam optimizer [31] and a fixed learning rate. No specific version numbers for software dependencies were found.
Experiment Setup	Yes	All relevant training hyperparameter values, across model families as well as BC vs APPO + BC experiment variants, are displayed in Table 3.