reproducibilityindex.ai

Universal Planning Networks: Learning Generalizable Representations for Visuomotor Control

Authors: Aravind Srinivas, Allan Jabri, Pieter Abbeel, Sergey Levine, Chelsea Finn

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In extensive experiments, we show that (1) UPNs learn effective visual goal-directed policies more efﬁciently (that is, with less data) than traditional imitation learners; (2) the latent representations induced by optimizing for successful planning can be leveraged to transfer task-related semantics to other agents for more challenging tasks through goal-conditioned reward functions, which to our knowledge has previously not been demonstrated; and (3) the learned planning computation improves when allowed more updates at test-time, even in scenarios of less data, providing encouraging evidence of successful meta-learning for planning.
Researcher Affiliation	Academia	1UC Berkeley, Computer Science. Correspondence to: Aravind Srinivas <aravind@cs.berkeley.edu>.
Pseudocode	Yes	Algorithm 1 GDP(ot, og, α)
Open Source Code	No	No concrete statement about open-source code availability for the methodology described in the paper. The provided link is for "video highlights" and a project homepage.
Open Datasets	No	All methods are trained on the same synthetically-generated expert demonstration datasets. We refer the reader to the supplementary materials for details on the architectures and dataset generation.
Dataset Splits	No	No explicit information on training, validation, or test dataset splits (percentages or counts) is provided in the main text.
Hardware Specification	No	No specific hardware details (like GPU/CPU models, memory, or cloud instance types) used for running experiments are mentioned in the paper.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., library names or programming language versions) are mentioned in the paper.
Experiment Setup	No	While T denotes the horizon over which the agent plans, which can depend on the task and hence may be treated as a hyperparameter, while np is the number of planning updates performed. Algorithm 1 and 2 mention hyperparameters α and β for step sizes.