reproducibilityindex.ai

Hardware Conditioned Policies for Multi-Robot Transfer Learning

Authors: Tao Chen, Adithyavairavan Murali, Abhinav Gupta

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our aim is to demonstrate the importance of conditioning the policy based on a hardware representation vh for transferring complicated policies between dissimilar robotic agents. We show performance gains on two diverse settings of manipulation and hopper.
Researcher Affiliation	Academia	Tao Chen The Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 taoc1@cs.cmu.edu Adithyavairavan Murali The Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 amurali@cs.cmu.edu Abhinav Gupta The Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 abhinavg@cs.cmu.edu
Pseudocode	Yes	Algorithm 1 Hardware Conditioned Policies (HCP)
Open Source Code	No	The paper provides a link for videos of experiments but does not provide concrete access to the source code for the methodology described.
Open Datasets	No	The paper describes creating custom robot manipulators and varying their properties within the MuJoCo simulation environment, but it does not provide access information (link, DOI, or citation) for a publicly available dataset.
Dataset Splits	Yes	We performed several leave-one-out experiments (train on 8 robot types, leave 1 robot type untouched) on these robot types.
Hardware Specification	No	The paper mentions running experiments on a 'real Sawyer robot' but does not specify the computing hardware (e.g., CPU, GPU models, memory) used for training models or running simulations.
Software Dependencies	No	The paper mentions using MuJoCo as a physics engine and specific DRL algorithms (PPO, DDPG+HER) but does not provide specific version numbers for software libraries, programming languages, or other ancillary dependencies.
Experiment Setup	Yes	Rewards: We use binary sparse reward setting because sparse reward is more realistic in robotics applications. And we use DPPG+HER as the backbone training algorithm. The agent only gets +1 reward if POI is within ϵ euclidean distance of the desired goal position. Otherwise, it gets 1 reward. We use ϵ = 0.02m in all experiments.