Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning

Authors: Abhishek Gupta, Coline Devin, YuXuan Liu, Pieter Abbeel, Sergey Levine

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our transfer learning algorithm in two simulated robotic manipulation skills, and illustrate that we can transfer knowledge between simulated robotic arms with different numbers of links, as well as simulated arms with different actuation mechanisms, where one robot is torque-driven while the other is tendon-driven. 5 EXPERIMENTS Our experiments aim to evaluate how well common feature space learning can transfer skills between morphologically different agents. The experiments were performed in simulation using the Mu Jo Co physics simulator (Todorov et al., 2012), in order to explore a variety of different robots and actuation mechanisms.
Researcher Affiliation Collaboration UC Berkeley, Department of Electrical Engineering and Computer Science Open AI {abhigupta,coline,svlevine}@eecs.berkeley.edu {yuxuanliu}@berkeley.edu {pieter}@openai.com
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper states: 'Videos of our experiment will be available at https://sites.google.com/ site/invariantfeaturetransfer/' but does not provide a link to the source code for the methodology.
Open Datasets No The paper describes experiments performed in simulation using the Mu Jo Co physics simulator and generates its own data through these simulations. It does not refer to a publicly available or open dataset with access information (link, DOI, citation).
Dataset Splits No The paper does not provide specific details about training, validation, or test dataset splits (e.g., percentages, sample counts, or references to predefined splits).
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It only mentions using the Mu Jo Co physics simulator.
Software Dependencies No The paper mentions the 'Mu Jo Co physics simulator (Todorov et al., 2012)' and the 'ADAM optimizer (Kingma & Ba, 2015)' but does not specify version numbers for these or any other software components or libraries.
Experiment Setup Yes The embedding functions f and g in our experiments are 3 layer neural networks with 60 hidden units each and Re Lu non-linearities. They are trained end-to-end with standard backpropagation using the ADAM optimizer (Kingma & Ba, 2015). We use a simple trajectory-centric reinforcement learning method that trains time-varying linear-Gaussian policies (Levine & Abbeel, 2014). This term has following form: rtransfer(s(t) T,r) = α||f(s(t) S,r;θf ) g(s(t) T,r;θg)||2, where ... α is a weight on the transfer reward that controls its importance relative to the overall task goal.