Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Neural Algorithmic Reasoners are Implicit Planners

Authors: Andreea-Ioana Deac, Petar Veličković, Ognjen Milinkovic, Pierre-Luc Bacon, Jian Tang, Mladen Nikolic

NeurIPS 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Across eight low-data settings including classical control, navigation and Atari XLVINs provide significant improvements to data efficiency against value iteration-based implicit planners, as well as relevant model-free baselines. Lastly, we empirically verify that XLVINs can closely align with value iteration.
Researcher Affiliation Collaboration Andreea Deac Mila Québec AI Institute Université de Montréal Petar Veliˇckovi c Mila Québec AI Institute Ognjen Milinkovi c Faculty of Mathematics University of Belgrade Pierre-Luc Bacon Mila Québec AI Institute Université de Montréal Jian Tang Mila Québec AI Institute HEC Montréal Mladen Nikoli c Faculty of Mathematics University of Belgrade ... Work performed while the author was at Deep Mind. ... PV is a Research Scientist at Deep Mind. AD was a Research Scientist Intern at Deep Mind while completing this work.
Pseudocode Yes Algorithm 1: XLVIN forward pass
Open Source Code No We share links to the base implementations of large parts of our agents, environments and PPO optimiser [25]. We aim to open source our full implementation at a later point.
Open Datasets Yes Continuous-space We focus on four Open AI Gym environments [9]: classical continuous-state control tasks Cart Pole, Acrobot and Mountain Car, and a continuous-state spaceship navigation task, Lunar Lander. ... Pixel-space Lastly, we investigate how XLVINs perform on high-dimensional pixel-based observations, using the Atari-2600 [6].
Dataset Splits No The paper describes training on policy rollouts and evaluation on test sets, but does not explicitly mention the use or creation of a dedicated validation set with specific splits or percentages.
Hardware Specification Yes All experiments were conducted on a GPU cluster, using NVIDIA V100 or NVIDIA A100 GPUs.
Software Dependencies No The paper mentions using PyTorch [31] and Kostrikov's PPO implementation [25], but does not provide specific version numbers for these software dependencies. It also refers to OpenAI Gym [9] and Atari-2600 [6] as environments.
Experiment Setup Yes The encoder function is a three-layer MLP with Re LU activations, computing 50 output features and F hidden features, where F = 64 for Cart Pole, F = 32 for Acrobot, F = 16 for Mountain Car and F = 64 for Lunar Lander. The same hidden dimension is also used in the transition function MLP. ... For Lunar Lander, the XLVIN uses K = 3 executor layers; in all other cases, K = 2. ... We evaluate the agents low-data performance by allowing only 1,000,000 observed transitions. We re-use exactly the environment and encoder from Kostrikov [25], and run the executor for K = 2 layers for Freeway and Enduro and K = 1 for Alien and H.E.R.O..