Neural Map: Structured Memory for Deep Reinforcement Learning
Authors: Emilio Parisotto, Ruslan Salakhutdinov
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate empirically that the Neural Map surpasses previous DRL memories on a set of challenging 2D and 3D maze environments and show that it is capable of generalizing to environments that were not seen during training. |
| Researcher Affiliation | Academia | Emilio Parisotto & Ruslan Salakhutdinov Department of Machine Learning Carnegie Mellon University Pittsburgh, PA 15213, USA {eparisot,rsalakhu}@cs.cmu.edu |
| Pseudocode | No | The paper provides mathematical equations for its operations but does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any statement about making its source code publicly available or provide any links to a code repository. |
| Open Datasets | No | The mazes during training are generated using a random generator. A held-out set of 1000 random mazes is kept for testing. The paper describes generating its own maze environments, and does not provide access information (link, citation, repository) to a publicly available or open dataset. |
| Dataset Splits | No | The mazes during training are generated using a random generator. A held-out set of 1000 random mazes is kept for testing. The paper mentions training and test sets but does not specify a distinct validation split for hyperparameter tuning, although it refers to a 'limited hyperparameter sweep'. |
| Hardware Specification | Yes | The authors would also like to thank NVidia NVAIL award for donating DGX-1 deep learning machine. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number OCI-1053575. Specifically, it used the Bridges system, which is supported by NSF award number ACI-1445606, at the Pittsburgh Supercomputing Center (PSC). |
| Software Dependencies | No | For optimization, all architectures used the RMSprop optimization algorithm... For optimization, all architectures used the Adam optimization algorithm... The paper mentions optimization algorithms and the Vi ZDoom API, but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | For optimization, all architectures used the RMSprop optimization algorithm with gradients thresholded to norm 20 for LSTM, 100 for Neural Map variants, and no thresholding for memory networks. We used an auxiliary weighted entropy loss on the Synchronous Actor-Critic with weight 0.01. The learning rates for LSTM models was 0.0025, 0.005 for Neural Map variants, and 0.001 for memory networks. We used A2C with number of time steps equal to 5. We trained for 10 million updates. |