Unsupervised Perceptual Rewards for Imitation Learning

Authors: Pierre Sermanet, Kelvin Xu, Sergey Levine

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate the learned reward functions, we present qualitative results on two real-world tasks and a quantitative evaluation against a human-designed reward function. We also demonstrate that our method can be used to learn a complex real-world door opening skill using a real robot, and A set of empirical experiments that show that the learned visual representations inside a pre-trained deep model are general enough to be directly used to represent goals and subgoals for manipulation skills in new scenes without retraining.
Researcher Affiliation Industry Pierre Sermanet, Kelvin Xu & Sergey Levine Google Brain {sermanet,kelvinxx,slevine}@google.com and Work done as part of the Google Brain Residency program (g.co/brainresidency).
Pseudocode Yes Algorithm 1 Recursive similarity maximization, where Average Std() is a function that computes the average standard deviation over a set of frames or over a set of values, Join() is a function that joins values or lists together into a single list, n is the number of splits desired and min size is the minimum size of a split. and Algorithm 2 Greedy and binary algorithm similar to and utilizing Algorithm 1, where Average Std() is a function that computes the average standard deviation over a set of frames or over a set of values, Join() is a function that joins values or lists together into a single list, n is the number of splits desired and min size is the minimum size of a split.
Open Source Code No The paper mentions using and implementing various algorithms (e.g., PI2 reinforcement learning, Inception network), but it does not provide an explicit statement about making the source code for their specific method or implementation publicly available, nor does it provide a link to a code repository.
Open Datasets Yes We use the Inception network (Szegedy et al., 2015) pre-trained for Image Net classification (Deng et al., 2009) to obtain the visual features for representing the learned rewards.
Dataset Splits No The paper mentions evaluating on a 'pouring validation set' in Figure 4 and discusses performance in 'validation and testing' in Section 2.3. However, it does not specify explicit training/validation/test split percentages or sample counts for their collected datasets (door opening and pouring tasks), which would be needed to reproduce the data partitioning.
Hardware Specification No The paper describes the robotic arm setup used for experiments ('We use a 7-Do F robotic arm with a two-finger gripper, and a camera placed above the shoulder, which provides monocular RGB images.'), but it does not specify any hardware details for the computational resources used to train or run the models, such as specific GPU or CPU models, memory, or cloud computing instances.
Software Dependencies No The paper mentions using the 'Inception network' and the 'PI2 reinforcement learning algorithm' (with reference to other papers for specific implementations), but it does not list any specific software dependencies with version numbers, such as Python versions, deep learning frameworks (e.g., TensorFlow, PyTorch), or CUDA versions, which are necessary for reproducibility.
Experiment Setup Yes We run PI2 for 11 iterations with 10 sampled trajectories at each iteration. and We empirically choose α = 5.0 and M = 32 for our subsequent experiments.