Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Planning from Pixels in Environments with Combinatorially Hard Search Spaces
Authors: Marco Bagatella, Miroslav Olšák, Michal Rolínek, Georg Martius
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The purpose of the experimental section is to empirically verify the following claims: (i) PPGS is able to solve challenging environments with an underlying combinatorial structure and (ii) PPGS is able to generalize to unseen variations of the environments, even when trained on few levels. |
| Researcher Affiliation | Academia | Marco Bagatella Max Planck Institute for Intelligent Systems Tübingen, Germany EMAIL Mirek Olšák Computer Science Department University Innsbruck, Austria EMAIL Michal Rolínek Max Planck Institute for Intelligent Systems Tübingen, Germany EMAIL Georg Martius Max Planck Institute for Intelligent Systems Tübingen, Germany EMAIL |
| Pseudocode | Yes | Algorithm 1 Simplified one-shot PPGS |
| Open Source Code | Yes | [2] https://github.com/martius-lab/PPGS, 2021. |
| Open Datasets | Yes | The last two environments are made available in a public repository [1], where they can also be tested interactively. More details on their implementation are included in Suppl. D. Procgen Maze is from [13]. |
| Dataset Splits | No | The paper does not explicitly provide details on validation dataset splits. It mentions training levels and testing on 100 unseen levels. |
| Hardware Specification | No | No specific hardware details (e.g., CPU/GPU models, memory) used for experiments were explicitly mentioned. |
| Software Dependencies | No | No specific software versions (e.g., Python, PyTorch, or other libraries/solvers) were explicitly provided. |
| Experiment Setup | Yes | Note that PPGS uses only 400k samples from a random policy whereas PPO uses 50M on-policy samples. |