Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Generalizing Skills with Semi-Supervised Reinforcement Learning
Authors: Chelsea Finn, Tianhe Yu, Justin Fu, Pieter Abbeel, Sergey Levine
ICLR 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 EXPERIMENTAL EVALUATION; We report the success rate of policies learned with each method in Table 1, and visualize the generalization performance in the 2-link reacher, cheetah, and obstacle tasks in Figure 3. |
| Researcher Affiliation | Collaboration | Berkeley AI Research (BAIR), University of California, Berkeley Open AI EMAIL |
| Pseudocode | Yes | Algorithm 1 Semi-Supervised Skill Generalization |
| Open Source Code | Yes | Code for reproducing the simulated experiments is available online1. 1The code is available at github.com/cbfinn/gps/tree/ssrl |
| Open Datasets | No | Thus, we define our own set of simulated control tasks for this paper, explicitly considering the types of variation that a robot might encounter in the real world. |
| Dataset Splits | No | The paper discusses 'labeled MDPs' and 'unlabeled MDPs' but does not provide explicit training/validation/test dataset splits with percentages or sample counts. |
| Hardware Specification | No | No specific hardware details (e.g., CPU/GPU models, memory) used for running experiments are mentioned. |
| Software Dependencies | No | The paper mentions 'Mu Jo Co simulator' and methods like 'mirror descent guided policy search (MDGPS)' but does not provide specific software names with version numbers for reproducibility. |
| Experiment Setup | Yes | For the non-visual tasks, the policy was represented using a neural network with 2 hidden layers of 40 units each. The vision task used 3 convolutional layers with 15 filters of size 5 5 each, followed by the spatial feature point transformation proposed by Levine et al. (2016), and lastly 3 fully-connected layers of 20 units each. The reward function architecture mirrored the architecture as the policy, but using a quadratic norm on the output, as done by Finn et al. (2016). |