Neural Algorithmic Reasoners are Implicit Planners
Authors: Andreea-Ioana Deac, Petar Veličković, Ognjen Milinkovic, Pierre-Luc Bacon, Jian Tang, Mladen Nikolic
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Across eight low-data settings including classical control, navigation and Atari XLVINs provide significant improvements to data efficiency against value iteration-based implicit planners, as well as relevant model-free baselines. Lastly, we empirically verify that XLVINs can closely align with value iteration. |
| Researcher Affiliation | Collaboration | Andreea Deac Mila Québec AI Institute Université de Montréal Petar Veliˇckovi c Mila Québec AI Institute Ognjen Milinkovi c Faculty of Mathematics University of Belgrade Pierre-Luc Bacon Mila Québec AI Institute Université de Montréal Jian Tang Mila Québec AI Institute HEC Montréal Mladen Nikoli c Faculty of Mathematics University of Belgrade ... Work performed while the author was at Deep Mind. ... PV is a Research Scientist at Deep Mind. AD was a Research Scientist Intern at Deep Mind while completing this work. |
| Pseudocode | Yes | Algorithm 1: XLVIN forward pass |
| Open Source Code | No | We share links to the base implementations of large parts of our agents, environments and PPO optimiser [25]. We aim to open source our full implementation at a later point. |
| Open Datasets | Yes | Continuous-space We focus on four Open AI Gym environments [9]: classical continuous-state control tasks Cart Pole, Acrobot and Mountain Car, and a continuous-state spaceship navigation task, Lunar Lander. ... Pixel-space Lastly, we investigate how XLVINs perform on high-dimensional pixel-based observations, using the Atari-2600 [6]. |
| Dataset Splits | No | The paper describes training on policy rollouts and evaluation on test sets, but does not explicitly mention the use or creation of a dedicated validation set with specific splits or percentages. |
| Hardware Specification | Yes | All experiments were conducted on a GPU cluster, using NVIDIA V100 or NVIDIA A100 GPUs. |
| Software Dependencies | No | The paper mentions using PyTorch [31] and Kostrikov's PPO implementation [25], but does not provide specific version numbers for these software dependencies. It also refers to OpenAI Gym [9] and Atari-2600 [6] as environments. |
| Experiment Setup | Yes | The encoder function is a three-layer MLP with Re LU activations, computing 50 output features and F hidden features, where F = 64 for Cart Pole, F = 32 for Acrobot, F = 16 for Mountain Car and F = 64 for Lunar Lander. The same hidden dimension is also used in the transition function MLP. ... For Lunar Lander, the XLVIN uses K = 3 executor layers; in all other cases, K = 2. ... We evaluate the agents low-data performance by allowing only 1,000,000 observed transitions. We re-use exactly the environment and encoder from Kostrikov [25], and run the executor for K = 2 layers for Freeway and Enduro and K = 1 for Alien and H.E.R.O.. |