Neuro-algorithmic Policies Enable Fast Combinatorial Generalization
Authors: Marin Vlastelica, Michal Rolinek, Georg Martius
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of this approach in an offline imitation learning setting where a few expert trajectories are provided. Due to the combinatorial generalization capabilities of planners, our learned policy is able to generalize to new variations in the environment out of the box and needs orders of magnitude fewer samples than naive learners. To validate our hypothesis that embedding planners into neural network architectures leads to better generalization in control problems, we consider several procedurally generated environments (from the Proc Gen suite (Cobbe et al., 2020) and CRASH JEWEL HUNT) with considerable variation between levels. We compare with the following baselines: a standard behavior cloning (BC) baseline using a Res Net18 architecture trained with a cross-entropy classification loss on the same dataset as our method; the PPO algorithm as implemented in Cobbe et al. (2020) and data-regularized actor-critic (Dr AC) (Raileanu et al., 2020). |
| Researcher Affiliation | Academia | 1Max Planck Institute for Intelligent Systems, Tübingen, Germany. Correspondence to: Marin Vlastelica <mvlastelica@tue.mpg.de>. |
| Pseudocode | Yes | Algorithm 1 Forward and backward pass for the shortest-path algorithm function FORWARDPASS(C, vs, vg) Y := TDSP(C, vs, vs) // Run Dijkstra s algo. save Y , C, vs, ve // Needed for backward pass return Y function BACKWARDPASS( L(Y ), λ) load Y , C, vs, ve Cλ := C + λ L(Y ) // Calculate modified costs Yλ := TDSP(Cλ, vs, vg) // Run Dijkstra s algo. return 1 |
| Open Source Code | Yes | Videos and Code are available at martius-lab.github.io/NAP. |
| Open Datasets | Yes | we consider several procedurally generated environments (from the Proc Gen suite (Cobbe et al., 2020) and CRASH JEWEL HUNT) |
| Dataset Splits | No | No specific quantitative dataset split information (percentages, sample counts, or explicit standard split citations) for training, validation, and test sets was found. The paper only states 'we sample distinct environment configurations for the training set and the test set, respectively.' |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory specifications) used for running experiments were provided in the paper. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., library names with versions). |
| Experiment Setup | No | The paper states 'More details on the training procedure and the hyperparameters can be found in the supplementary Sec. D', but does not provide specific experimental setup details (concrete hyperparameter values, training configurations, or system-level settings) in the main text itself. |