Unsupervised Perceptual Rewards for Imitation Learning
Authors: Pierre Sermanet, Kelvin Xu, Sergey Levine
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate the learned reward functions, we present qualitative results on two real-world tasks and a quantitative evaluation against a human-designed reward function. We also demonstrate that our method can be used to learn a complex real-world door opening skill using a real robot, and A set of empirical experiments that show that the learned visual representations inside a pre-trained deep model are general enough to be directly used to represent goals and subgoals for manipulation skills in new scenes without retraining. |
| Researcher Affiliation | Industry | Pierre Sermanet, Kelvin Xu & Sergey Levine Google Brain {sermanet,kelvinxx,slevine}@google.com and Work done as part of the Google Brain Residency program (g.co/brainresidency). |
| Pseudocode | Yes | Algorithm 1 Recursive similarity maximization, where Average Std() is a function that computes the average standard deviation over a set of frames or over a set of values, Join() is a function that joins values or lists together into a single list, n is the number of splits desired and min size is the minimum size of a split. and Algorithm 2 Greedy and binary algorithm similar to and utilizing Algorithm 1, where Average Std() is a function that computes the average standard deviation over a set of frames or over a set of values, Join() is a function that joins values or lists together into a single list, n is the number of splits desired and min size is the minimum size of a split. |
| Open Source Code | No | The paper mentions using and implementing various algorithms (e.g., PI2 reinforcement learning, Inception network), but it does not provide an explicit statement about making the source code for their specific method or implementation publicly available, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We use the Inception network (Szegedy et al., 2015) pre-trained for Image Net classification (Deng et al., 2009) to obtain the visual features for representing the learned rewards. |
| Dataset Splits | No | The paper mentions evaluating on a 'pouring validation set' in Figure 4 and discusses performance in 'validation and testing' in Section 2.3. However, it does not specify explicit training/validation/test split percentages or sample counts for their collected datasets (door opening and pouring tasks), which would be needed to reproduce the data partitioning. |
| Hardware Specification | No | The paper describes the robotic arm setup used for experiments ('We use a 7-Do F robotic arm with a two-finger gripper, and a camera placed above the shoulder, which provides monocular RGB images.'), but it does not specify any hardware details for the computational resources used to train or run the models, such as specific GPU or CPU models, memory, or cloud computing instances. |
| Software Dependencies | No | The paper mentions using the 'Inception network' and the 'PI2 reinforcement learning algorithm' (with reference to other papers for specific implementations), but it does not list any specific software dependencies with version numbers, such as Python versions, deep learning frameworks (e.g., TensorFlow, PyTorch), or CUDA versions, which are necessary for reproducibility. |
| Experiment Setup | Yes | We run PI2 for 11 iterations with 10 sampled trajectories at each iteration. and We empirically choose α = 5.0 and M = 32 for our subsequent experiments. |