Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Hardware Conditioned Policies for Multi-Robot Transfer Learning
Authors: Tao Chen, Adithyavairavan Murali, Abhinav Gupta
NeurIPS 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our aim is to demonstrate the importance of conditioning the policy based on a hardware representation vh for transferring complicated policies between dissimilar robotic agents. We show performance gains on two diverse settings of manipulation and hopper. |
| Researcher Affiliation | Academia | Tao Chen The Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 EMAIL Adithyavairavan Murali The Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 EMAIL Abhinav Gupta The Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 EMAIL |
| Pseudocode | Yes | Algorithm 1 Hardware Conditioned Policies (HCP) |
| Open Source Code | No | The paper provides a link for videos of experiments but does not provide concrete access to the source code for the methodology described. |
| Open Datasets | No | The paper describes creating custom robot manipulators and varying their properties within the MuJoCo simulation environment, but it does not provide access information (link, DOI, or citation) for a publicly available dataset. |
| Dataset Splits | Yes | We performed several leave-one-out experiments (train on 8 robot types, leave 1 robot type untouched) on these robot types. |
| Hardware Specification | No | The paper mentions running experiments on a 'real Sawyer robot' but does not specify the computing hardware (e.g., CPU, GPU models, memory) used for training models or running simulations. |
| Software Dependencies | No | The paper mentions using MuJoCo as a physics engine and specific DRL algorithms (PPO, DDPG+HER) but does not provide specific version numbers for software libraries, programming languages, or other ancillary dependencies. |
| Experiment Setup | Yes | Rewards: We use binary sparse reward setting because sparse reward is more realistic in robotics applications. And we use DPPG+HER as the backbone training algorithm. The agent only gets +1 reward if POI is within ϵ euclidean distance of the desired goal position. Otherwise, it gets 1 reward. We use ϵ = 0.02m in all experiments. |