reproducibilityindex.ai

Neural Algorithmic Reasoners are Implicit Planners

Authors: Andreea-Ioana Deac, Petar Veličković, Ognjen Milinkovic, Pierre-Luc Bacon, Jian Tang, Mladen Nikolic

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Across eight low-data settings including classical control, navigation and Atari XLVINs provide signiﬁcant improvements to data efﬁciency against value iteration-based implicit planners, as well as relevant model-free baselines. Lastly, we empirically verify that XLVINs can closely align with value iteration.
Researcher Affiliation	Collaboration	Andreea Deac Mila Québec AI Institute Université de Montréal Petar Veliˇckovi c Mila Québec AI Institute Ognjen Milinkovi c Faculty of Mathematics University of Belgrade Pierre-Luc Bacon Mila Québec AI Institute Université de Montréal Jian Tang Mila Québec AI Institute HEC Montréal Mladen Nikoli c Faculty of Mathematics University of Belgrade ... Work performed while the author was at Deep Mind. ... PV is a Research Scientist at Deep Mind. AD was a Research Scientist Intern at Deep Mind while completing this work.
Pseudocode	Yes	Algorithm 1: XLVIN forward pass
Open Source Code	No	We share links to the base implementations of large parts of our agents, environments and PPO optimiser [25]. We aim to open source our full implementation at a later point.
Open Datasets	Yes	Continuous-space We focus on four Open AI Gym environments [9]: classical continuous-state control tasks Cart Pole, Acrobot and Mountain Car, and a continuous-state spaceship navigation task, Lunar Lander. ... Pixel-space Lastly, we investigate how XLVINs perform on high-dimensional pixel-based observations, using the Atari-2600 [6].
Dataset Splits	No	The paper describes training on policy rollouts and evaluation on test sets, but does not explicitly mention the use or creation of a dedicated validation set with specific splits or percentages.
Hardware Specification	Yes	All experiments were conducted on a GPU cluster, using NVIDIA V100 or NVIDIA A100 GPUs.
Software Dependencies	No	The paper mentions using PyTorch [31] and Kostrikov's PPO implementation [25], but does not provide specific version numbers for these software dependencies. It also refers to OpenAI Gym [9] and Atari-2600 [6] as environments.
Experiment Setup	Yes	The encoder function is a three-layer MLP with Re LU activations, computing 50 output features and F hidden features, where F = 64 for Cart Pole, F = 32 for Acrobot, F = 16 for Mountain Car and F = 64 for Lunar Lander. The same hidden dimension is also used in the transition function MLP. ... For Lunar Lander, the XLVIN uses K = 3 executor layers; in all other cases, K = 2. ... We evaluate the agents low-data performance by allowing only 1,000,000 observed transitions. We re-use exactly the environment and encoder from Kostrikov [25], and run the executor for K = 2 layers for Freeway and Enduro and K = 1 for Alien and H.E.R.O..