Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Neural Algorithmic Reasoners are Implicit Planners
Authors: Andreea-Ioana Deac, Petar Veličković, Ognjen Milinkovic, Pierre-Luc Bacon, Jian Tang, Mladen Nikolic
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Across eight low-data settings including classical control, navigation and Atari XLVINs provide significant improvements to data efficiency against value iteration-based implicit planners, as well as relevant model-free baselines. Lastly, we empirically verify that XLVINs can closely align with value iteration. |
| Researcher Affiliation | Collaboration | Andreea Deac Mila Québec AI Institute Université de Montréal Petar Veliˇckovi c Mila Québec AI Institute Ognjen Milinkovi c Faculty of Mathematics University of Belgrade Pierre-Luc Bacon Mila Québec AI Institute Université de Montréal Jian Tang Mila Québec AI Institute HEC Montréal Mladen Nikoli c Faculty of Mathematics University of Belgrade ... Work performed while the author was at Deep Mind. ... PV is a Research Scientist at Deep Mind. AD was a Research Scientist Intern at Deep Mind while completing this work. |
| Pseudocode | Yes | Algorithm 1: XLVIN forward pass |
| Open Source Code | No | We share links to the base implementations of large parts of our agents, environments and PPO optimiser [25]. We aim to open source our full implementation at a later point. |
| Open Datasets | Yes | Continuous-space We focus on four Open AI Gym environments [9]: classical continuous-state control tasks Cart Pole, Acrobot and Mountain Car, and a continuous-state spaceship navigation task, Lunar Lander. ... Pixel-space Lastly, we investigate how XLVINs perform on high-dimensional pixel-based observations, using the Atari-2600 [6]. |
| Dataset Splits | No | The paper describes training on policy rollouts and evaluation on test sets, but does not explicitly mention the use or creation of a dedicated validation set with specific splits or percentages. |
| Hardware Specification | Yes | All experiments were conducted on a GPU cluster, using NVIDIA V100 or NVIDIA A100 GPUs. |
| Software Dependencies | No | The paper mentions using PyTorch [31] and Kostrikov's PPO implementation [25], but does not provide specific version numbers for these software dependencies. It also refers to OpenAI Gym [9] and Atari-2600 [6] as environments. |
| Experiment Setup | Yes | The encoder function is a three-layer MLP with Re LU activations, computing 50 output features and F hidden features, where F = 64 for Cart Pole, F = 32 for Acrobot, F = 16 for Mountain Car and F = 64 for Lunar Lander. The same hidden dimension is also used in the transition function MLP. ... For Lunar Lander, the XLVIN uses K = 3 executor layers; in all other cases, K = 2. ... We evaluate the agents low-data performance by allowing only 1,000,000 observed transitions. We re-use exactly the environment and encoder from Kostrikov [25], and run the executor for K = 2 layers for Freeway and Enduro and K = 1 for Alien and H.E.R.O.. |