Hardware Conditioned Policies for Multi-Robot Transfer Learning

Authors: Tao Chen, Adithyavairavan Murali, Abhinav Gupta

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our aim is to demonstrate the importance of conditioning the policy based on a hardware representation vh for transferring complicated policies between dissimilar robotic agents. We show performance gains on two diverse settings of manipulation and hopper.
Researcher Affiliation Academia Tao Chen The Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 taoc1@cs.cmu.edu Adithyavairavan Murali The Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 amurali@cs.cmu.edu Abhinav Gupta The Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 abhinavg@cs.cmu.edu
Pseudocode Yes Algorithm 1 Hardware Conditioned Policies (HCP)
Open Source Code No The paper provides a link for videos of experiments but does not provide concrete access to the source code for the methodology described.
Open Datasets No The paper describes creating custom robot manipulators and varying their properties within the MuJoCo simulation environment, but it does not provide access information (link, DOI, or citation) for a publicly available dataset.
Dataset Splits Yes We performed several leave-one-out experiments (train on 8 robot types, leave 1 robot type untouched) on these robot types.
Hardware Specification No The paper mentions running experiments on a 'real Sawyer robot' but does not specify the computing hardware (e.g., CPU, GPU models, memory) used for training models or running simulations.
Software Dependencies No The paper mentions using MuJoCo as a physics engine and specific DRL algorithms (PPO, DDPG+HER) but does not provide specific version numbers for software libraries, programming languages, or other ancillary dependencies.
Experiment Setup Yes Rewards: We use binary sparse reward setting because sparse reward is more realistic in robotics applications. And we use DPPG+HER as the backbone training algorithm. The agent only gets +1 reward if POI is within ϵ euclidean distance of the desired goal position. Otherwise, it gets 1 reward. We use ϵ = 0.02m in all experiments.