Generalizing Skills with Semi-Supervised Reinforcement Learning

Authors: Chelsea Finn, Tianhe Yu, Justin Fu, Pieter Abbeel, Sergey Levine

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 EXPERIMENTAL EVALUATION; We report the success rate of policies learned with each method in Table 1, and visualize the generalization performance in the 2-link reacher, cheetah, and obstacle tasks in Figure 3.
Researcher Affiliation Collaboration Berkeley AI Research (BAIR), University of California, Berkeley Open AI {cbfinn,tianhe.yu,justinfu,pabbeel,svlevine}@berkeley.edu
Pseudocode Yes Algorithm 1 Semi-Supervised Skill Generalization
Open Source Code Yes Code for reproducing the simulated experiments is available online1. 1The code is available at github.com/cbfinn/gps/tree/ssrl
Open Datasets No Thus, we define our own set of simulated control tasks for this paper, explicitly considering the types of variation that a robot might encounter in the real world.
Dataset Splits No The paper discusses 'labeled MDPs' and 'unlabeled MDPs' but does not provide explicit training/validation/test dataset splits with percentages or sample counts.
Hardware Specification No No specific hardware details (e.g., CPU/GPU models, memory) used for running experiments are mentioned.
Software Dependencies No The paper mentions 'Mu Jo Co simulator' and methods like 'mirror descent guided policy search (MDGPS)' but does not provide specific software names with version numbers for reproducibility.
Experiment Setup Yes For the non-visual tasks, the policy was represented using a neural network with 2 hidden layers of 40 units each. The vision task used 3 convolutional layers with 15 filters of size 5 5 each, followed by the spatial feature point transformation proposed by Levine et al. (2016), and lastly 3 fully-connected layers of 20 units each. The reward function architecture mirrored the architecture as the policy, but using a quadratic norm on the output, as done by Finn et al. (2016).