Universal Planning Networks: Learning Generalizable Representations for Visuomotor Control
Authors: Aravind Srinivas, Allan Jabri, Pieter Abbeel, Sergey Levine, Chelsea Finn
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In extensive experiments, we show that (1) UPNs learn effective visual goal-directed policies more efficiently (that is, with less data) than traditional imitation learners; (2) the latent representations induced by optimizing for successful planning can be leveraged to transfer task-related semantics to other agents for more challenging tasks through goal-conditioned reward functions, which to our knowledge has previously not been demonstrated; and (3) the learned planning computation improves when allowed more updates at test-time, even in scenarios of less data, providing encouraging evidence of successful meta-learning for planning. |
| Researcher Affiliation | Academia | 1UC Berkeley, Computer Science. Correspondence to: Aravind Srinivas <aravind@cs.berkeley.edu>. |
| Pseudocode | Yes | Algorithm 1 GDP(ot, og, α) |
| Open Source Code | No | No concrete statement about open-source code availability for the methodology described in the paper. The provided link is for "video highlights" and a project homepage. |
| Open Datasets | No | All methods are trained on the same synthetically-generated expert demonstration datasets. We refer the reader to the supplementary materials for details on the architectures and dataset generation. |
| Dataset Splits | No | No explicit information on training, validation, or test dataset splits (percentages or counts) is provided in the main text. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models, memory, or cloud instance types) used for running experiments are mentioned in the paper. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., library names or programming language versions) are mentioned in the paper. |
| Experiment Setup | No | While T denotes the horizon over which the agent plans, which can depend on the task and hence may be treated as a hyperparameter, while np is the number of planning updates performed. Algorithm 1 and 2 mention hyperparameters α and β for step sizes. |