The NetHack Learning Environment

Authors: Heinrich Küttler, Nantas Nardelli, Alexander Miller, Roberta Raileanu, Marco Selvatici, Edward Grefenstette, Tim Rocktäschel

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate empirical success for early stages of the game using a distributed Deep RL baseline and Random Network Distillation exploration, alongside qualitative analysis of various agents trained in the environment.
Researcher Affiliation Collaboration +Facebook AI Research =University of Oxford *New York University #Imperial College London !University College London
Pseudocode No The paper describes the model architecture but does not include any pseudocode or algorithm blocks.
Open Source Code Yes NLE is open source and available at https://github.com/facebookresearch/nle.
Open Datasets Yes Throughout training, we change Net Hack s seed for procedurally generating the environment after every episode. and Training only on a limited number of seeds leads to high training performance, but poor generalization. The gap between training and test performance becomes narrow when training with at least 1000 seeds...
Dataset Splits Yes Training only on a limited number of seeds leads to high training performance, but poor generalization. The gap between training and test performance becomes narrow when training with at least 1000 seeds, indicating that at that point agents are exposed to sufficient variation during training to make memorization infeasible.
Hardware Specification No The paper states "We train our models on a single GPU with 16GB of memory." and "We use 256 IMPALA actors, each running on a single CPU core." and "Our environment implementation achieves up to 20K frames per second (fps) single-threaded on a single CPU core". While memory and core count are given, specific models of GPUs or CPUs are not provided.
Software Dependencies Yes We use Torch Beast [44] (PyTorch 1.3) for distributed RL.
Experiment Setup Yes For the main experiments, we train the agent s policy for 1B steps in the environment using IMPALA [24] as implemented in Torch Beast [44]. and The learner updates the network parameters every 10 steps using Adam optimizer [45] with a learning rate of 1e-4, epsilon of 1e-3, and weight decay of 1e-5. and All models are trained for a total of 10B frames. and We use a discount factor of 0.999. and We use a batch size of 256.