Generalizing Skills with Semi-Supervised Reinforcement Learning
Authors: Chelsea Finn, Tianhe Yu, Justin Fu, Pieter Abbeel, Sergey Levine
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 EXPERIMENTAL EVALUATION; We report the success rate of policies learned with each method in Table 1, and visualize the generalization performance in the 2-link reacher, cheetah, and obstacle tasks in Figure 3. |
| Researcher Affiliation | Collaboration | Berkeley AI Research (BAIR), University of California, Berkeley Open AI {cbfinn,tianhe.yu,justinfu,pabbeel,svlevine}@berkeley.edu |
| Pseudocode | Yes | Algorithm 1 Semi-Supervised Skill Generalization |
| Open Source Code | Yes | Code for reproducing the simulated experiments is available online1. 1The code is available at github.com/cbfinn/gps/tree/ssrl |
| Open Datasets | No | Thus, we define our own set of simulated control tasks for this paper, explicitly considering the types of variation that a robot might encounter in the real world. |
| Dataset Splits | No | The paper discusses 'labeled MDPs' and 'unlabeled MDPs' but does not provide explicit training/validation/test dataset splits with percentages or sample counts. |
| Hardware Specification | No | No specific hardware details (e.g., CPU/GPU models, memory) used for running experiments are mentioned. |
| Software Dependencies | No | The paper mentions 'Mu Jo Co simulator' and methods like 'mirror descent guided policy search (MDGPS)' but does not provide specific software names with version numbers for reproducibility. |
| Experiment Setup | Yes | For the non-visual tasks, the policy was represented using a neural network with 2 hidden layers of 40 units each. The vision task used 3 convolutional layers with 15 filters of size 5 5 each, followed by the spatial feature point transformation proposed by Levine et al. (2016), and lastly 3 fully-connected layers of 20 units each. The reward function architecture mirrored the architecture as the policy, but using a quadratic norm on the output, as done by Finn et al. (2016). |