The NetHack Learning Environment
Authors: Heinrich Küttler, Nantas Nardelli, Alexander Miller, Roberta Raileanu, Marco Selvatici, Edward Grefenstette, Tim Rocktäschel
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate empirical success for early stages of the game using a distributed Deep RL baseline and Random Network Distillation exploration, alongside qualitative analysis of various agents trained in the environment. |
| Researcher Affiliation | Collaboration | +Facebook AI Research =University of Oxford *New York University #Imperial College London !University College London |
| Pseudocode | No | The paper describes the model architecture but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | NLE is open source and available at https://github.com/facebookresearch/nle. |
| Open Datasets | Yes | Throughout training, we change Net Hack s seed for procedurally generating the environment after every episode. and Training only on a limited number of seeds leads to high training performance, but poor generalization. The gap between training and test performance becomes narrow when training with at least 1000 seeds... |
| Dataset Splits | Yes | Training only on a limited number of seeds leads to high training performance, but poor generalization. The gap between training and test performance becomes narrow when training with at least 1000 seeds, indicating that at that point agents are exposed to sufficient variation during training to make memorization infeasible. |
| Hardware Specification | No | The paper states "We train our models on a single GPU with 16GB of memory." and "We use 256 IMPALA actors, each running on a single CPU core." and "Our environment implementation achieves up to 20K frames per second (fps) single-threaded on a single CPU core". While memory and core count are given, specific models of GPUs or CPUs are not provided. |
| Software Dependencies | Yes | We use Torch Beast [44] (PyTorch 1.3) for distributed RL. |
| Experiment Setup | Yes | For the main experiments, we train the agent s policy for 1B steps in the environment using IMPALA [24] as implemented in Torch Beast [44]. and The learner updates the network parameters every 10 steps using Adam optimizer [45] with a learning rate of 1e-4, epsilon of 1e-3, and weight decay of 1e-5. and All models are trained for a total of 10B frames. and We use a discount factor of 0.999. and We use a batch size of 256. |