Universal Planning Networks: Learning Generalizable Representations for Visuomotor Control

Authors: Aravind Srinivas, Allan Jabri, Pieter Abbeel, Sergey Levine, Chelsea Finn

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In extensive experiments, we show that (1) UPNs learn effective visual goal-directed policies more efficiently (that is, with less data) than traditional imitation learners; (2) the latent representations induced by optimizing for successful planning can be leveraged to transfer task-related semantics to other agents for more challenging tasks through goal-conditioned reward functions, which to our knowledge has previously not been demonstrated; and (3) the learned planning computation improves when allowed more updates at test-time, even in scenarios of less data, providing encouraging evidence of successful meta-learning for planning.
Researcher Affiliation Academia 1UC Berkeley, Computer Science. Correspondence to: Aravind Srinivas <aravind@cs.berkeley.edu>.
Pseudocode Yes Algorithm 1 GDP(ot, og, α)
Open Source Code No No concrete statement about open-source code availability for the methodology described in the paper. The provided link is for "video highlights" and a project homepage.
Open Datasets No All methods are trained on the same synthetically-generated expert demonstration datasets. We refer the reader to the supplementary materials for details on the architectures and dataset generation.
Dataset Splits No No explicit information on training, validation, or test dataset splits (percentages or counts) is provided in the main text.
Hardware Specification No No specific hardware details (like GPU/CPU models, memory, or cloud instance types) used for running experiments are mentioned in the paper.
Software Dependencies No No specific software dependencies with version numbers (e.g., library names or programming language versions) are mentioned in the paper.
Experiment Setup No While T denotes the horizon over which the agent plans, which can depend on the task and hence may be treated as a hyperparameter, while np is the number of planning updates performed. Algorithm 1 and 2 mention hyperparameters α and β for step sizes.