reproducibilityindex.ai

Generalizing Skills with Semi-Supervised Reinforcement Learning

Authors: Chelsea Finn, Tianhe Yu, Justin Fu, Pieter Abbeel, Sergey Levine

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 EXPERIMENTAL EVALUATION; We report the success rate of policies learned with each method in Table 1, and visualize the generalization performance in the 2-link reacher, cheetah, and obstacle tasks in Figure 3.
Researcher Affiliation	Collaboration	Berkeley AI Research (BAIR), University of California, Berkeley Open AI {cbfinn,tianhe.yu,justinfu,pabbeel,svlevine}@berkeley.edu
Pseudocode	Yes	Algorithm 1 Semi-Supervised Skill Generalization
Open Source Code	Yes	Code for reproducing the simulated experiments is available online1. 1The code is available at github.com/cbfinn/gps/tree/ssrl
Open Datasets	No	Thus, we deﬁne our own set of simulated control tasks for this paper, explicitly considering the types of variation that a robot might encounter in the real world.
Dataset Splits	No	The paper discusses 'labeled MDPs' and 'unlabeled MDPs' but does not provide explicit training/validation/test dataset splits with percentages or sample counts.
Hardware Specification	No	No specific hardware details (e.g., CPU/GPU models, memory) used for running experiments are mentioned.
Software Dependencies	No	The paper mentions 'Mu Jo Co simulator' and methods like 'mirror descent guided policy search (MDGPS)' but does not provide specific software names with version numbers for reproducibility.
Experiment Setup	Yes	For the non-visual tasks, the policy was represented using a neural network with 2 hidden layers of 40 units each. The vision task used 3 convolutional layers with 15 ﬁlters of size 5 5 each, followed by the spatial feature point transformation proposed by Levine et al. (2016), and lastly 3 fully-connected layers of 20 units each. The reward function architecture mirrored the architecture as the policy, but using a quadratic norm on the output, as done by Finn et al. (2016).