Planning from Pixels in Environments with Combinatorially Hard Search Spaces

Authors: Marco Bagatella, Miroslav Olšák, Michal Rolínek, Georg Martius

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The purpose of the experimental section is to empirically verify the following claims: (i) PPGS is able to solve challenging environments with an underlying combinatorial structure and (ii) PPGS is able to generalize to unseen variations of the environments, even when trained on few levels.
Researcher Affiliation Academia Marco Bagatella Max Planck Institute for Intelligent Systems Tübingen, Germany mbagatella@tue.mpg.de Mirek Olšák Computer Science Department University Innsbruck, Austria mirek@olsak.net Michal Rolínek Max Planck Institute for Intelligent Systems Tübingen, Germany michal.rolinek@tue.mpg.de Georg Martius Max Planck Institute for Intelligent Systems Tübingen, Germany georg.martius@tue.mpg.de
Pseudocode Yes Algorithm 1 Simplified one-shot PPGS
Open Source Code Yes [2] https://github.com/martius-lab/PPGS, 2021.
Open Datasets Yes The last two environments are made available in a public repository [1], where they can also be tested interactively. More details on their implementation are included in Suppl. D. Procgen Maze is from [13].
Dataset Splits No The paper does not explicitly provide details on validation dataset splits. It mentions training levels and testing on 100 unseen levels.
Hardware Specification No No specific hardware details (e.g., CPU/GPU models, memory) used for experiments were explicitly mentioned.
Software Dependencies No No specific software versions (e.g., Python, PyTorch, or other libraries/solvers) were explicitly provided.
Experiment Setup Yes Note that PPGS uses only 400k samples from a random policy whereas PPO uses 50M on-policy samples.