reproducibilityindex.ai

Neuro-algorithmic Policies Enable Fast Combinatorial Generalization

Authors: Marin Vlastelica, Michal Rolinek, Georg Martius

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of this approach in an ofﬂine imitation learning setting where a few expert trajectories are provided. Due to the combinatorial generalization capabilities of planners, our learned policy is able to generalize to new variations in the environment out of the box and needs orders of magnitude fewer samples than naive learners. To validate our hypothesis that embedding planners into neural network architectures leads to better generalization in control problems, we consider several procedurally generated environments (from the Proc Gen suite (Cobbe et al., 2020) and CRASH JEWEL HUNT) with considerable variation between levels. We compare with the following baselines: a standard behavior cloning (BC) baseline using a Res Net18 architecture trained with a cross-entropy classiﬁcation loss on the same dataset as our method; the PPO algorithm as implemented in Cobbe et al. (2020) and data-regularized actor-critic (Dr AC) (Raileanu et al., 2020).
Researcher Affiliation	Academia	1Max Planck Institute for Intelligent Systems, Tübingen, Germany. Correspondence to: Marin Vlastelica <mvlastelica@tue.mpg.de>.
Pseudocode	Yes	Algorithm 1 Forward and backward pass for the shortest-path algorithm function FORWARDPASS(C, vs, vg) Y := TDSP(C, vs, vs) // Run Dijkstra s algo. save Y , C, vs, ve // Needed for backward pass return Y function BACKWARDPASS( L(Y ), λ) load Y , C, vs, ve Cλ := C + λ L(Y ) // Calculate modiﬁed costs Yλ := TDSP(Cλ, vs, vg) // Run Dijkstra s algo. return 1
Open Source Code	Yes	Videos and Code are available at martius-lab.github.io/NAP.
Open Datasets	Yes	we consider several procedurally generated environments (from the Proc Gen suite (Cobbe et al., 2020) and CRASH JEWEL HUNT)
Dataset Splits	No	No specific quantitative dataset split information (percentages, sample counts, or explicit standard split citations) for training, validation, and test sets was found. The paper only states 'we sample distinct environment conﬁgurations for the training set and the test set, respectively.'
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory specifications) used for running experiments were provided in the paper.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., library names with versions).
Experiment Setup	No	The paper states 'More details on the training procedure and the hyperparameters can be found in the supplementary Sec. D', but does not provide specific experimental setup details (concrete hyperparameter values, training configurations, or system-level settings) in the main text itself.