Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning
Authors: Abhishek Gupta, Coline Devin, YuXuan Liu, Pieter Abbeel, Sergey Levine
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our transfer learning algorithm in two simulated robotic manipulation skills, and illustrate that we can transfer knowledge between simulated robotic arms with different numbers of links, as well as simulated arms with different actuation mechanisms, where one robot is torque-driven while the other is tendon-driven. 5 EXPERIMENTS Our experiments aim to evaluate how well common feature space learning can transfer skills between morphologically different agents. The experiments were performed in simulation using the Mu Jo Co physics simulator (Todorov et al., 2012), in order to explore a variety of different robots and actuation mechanisms. |
| Researcher Affiliation | Collaboration | UC Berkeley, Department of Electrical Engineering and Computer Science Open AI {abhigupta,coline,svlevine}@eecs.berkeley.edu {yuxuanliu}@berkeley.edu {pieter}@openai.com |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states: 'Videos of our experiment will be available at https://sites.google.com/ site/invariantfeaturetransfer/' but does not provide a link to the source code for the methodology. |
| Open Datasets | No | The paper describes experiments performed in simulation using the Mu Jo Co physics simulator and generates its own data through these simulations. It does not refer to a publicly available or open dataset with access information (link, DOI, citation). |
| Dataset Splits | No | The paper does not provide specific details about training, validation, or test dataset splits (e.g., percentages, sample counts, or references to predefined splits). |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It only mentions using the Mu Jo Co physics simulator. |
| Software Dependencies | No | The paper mentions the 'Mu Jo Co physics simulator (Todorov et al., 2012)' and the 'ADAM optimizer (Kingma & Ba, 2015)' but does not specify version numbers for these or any other software components or libraries. |
| Experiment Setup | Yes | The embedding functions f and g in our experiments are 3 layer neural networks with 60 hidden units each and Re Lu non-linearities. They are trained end-to-end with standard backpropagation using the ADAM optimizer (Kingma & Ba, 2015). We use a simple trajectory-centric reinforcement learning method that trains time-varying linear-Gaussian policies (Levine & Abbeel, 2014). This term has following form: rtransfer(s(t) T,r) = α||f(s(t) S,r;θf ) g(s(t) T,r;θg)||2, where ... α is a weight on the transfer reward that controls its importance relative to the overall task goal. |