Hardware Conditioned Policies for Multi-Robot Transfer Learning
Authors: Tao Chen, Adithyavairavan Murali, Abhinav Gupta
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our aim is to demonstrate the importance of conditioning the policy based on a hardware representation vh for transferring complicated policies between dissimilar robotic agents. We show performance gains on two diverse settings of manipulation and hopper. |
| Researcher Affiliation | Academia | Tao Chen The Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 taoc1@cs.cmu.edu Adithyavairavan Murali The Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 amurali@cs.cmu.edu Abhinav Gupta The Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 abhinavg@cs.cmu.edu |
| Pseudocode | Yes | Algorithm 1 Hardware Conditioned Policies (HCP) |
| Open Source Code | No | The paper provides a link for videos of experiments but does not provide concrete access to the source code for the methodology described. |
| Open Datasets | No | The paper describes creating custom robot manipulators and varying their properties within the MuJoCo simulation environment, but it does not provide access information (link, DOI, or citation) for a publicly available dataset. |
| Dataset Splits | Yes | We performed several leave-one-out experiments (train on 8 robot types, leave 1 robot type untouched) on these robot types. |
| Hardware Specification | No | The paper mentions running experiments on a 'real Sawyer robot' but does not specify the computing hardware (e.g., CPU, GPU models, memory) used for training models or running simulations. |
| Software Dependencies | No | The paper mentions using MuJoCo as a physics engine and specific DRL algorithms (PPO, DDPG+HER) but does not provide specific version numbers for software libraries, programming languages, or other ancillary dependencies. |
| Experiment Setup | Yes | Rewards: We use binary sparse reward setting because sparse reward is more realistic in robotics applications. And we use DPPG+HER as the backbone training algorithm. The agent only gets +1 reward if POI is within ϵ euclidean distance of the desired goal position. Otherwise, it gets 1 reward. We use ϵ = 0.02m in all experiments. |