reproducibilityindex.ai

Discovering Motor Programs by Recomposing Demonstrations

Authors: Tanmay Shankar, Shubham Tulsiani, Lerrel Pinto, Abhinav Gupta

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate, both qualitatively and quantitatively, that our learned primitives capture semantically meaningful aspects of a demonstration. This allows us to compose these primitives in a hierarchical reinforcement learning setup to efﬁciently solve robotic manipulation tasks like reaching and pushing.
Researcher Affiliation	Collaboration	Tanmay Shankar Shubham Tulsiani Facebook AI Research Facebook AI Research tanmayshankar@fb.com shubhtuls@fb.com Lerrel Pinto Robotics Institute, CMU lerrelp@cs.cmu.edu Abhinav Gupta Facebook AI Research gabhinav@fb.com
Pseudocode	No	The paper does not include a clearly labeled pseudocode or algorithm block.
Open Source Code	No	The paper links to a webpage for visualizations (e.g., 'We provide a GIF version of Fig. 3 (and other visualizations) at https://sites.google.com/view/discovering-motor-programs/home.'), but does not explicitly state that the source code for the described methodology is available.
Open Datasets	Yes	We use the MIME dataset (Sharma et al., 2018) to train and evaluate our model.
Dataset Splits	Yes	We randomly sample a train set of 5900 demonstrations from all 20 tasks, with a validation set of 1600 trajectories, and a held-out test set of 850 trajectories.
Hardware Specification	No	The paper mentions the Baxter Robot for data collection and execution (e.g., 'collected on a real-world Baxter Robot.'), but does not specify the hardware used for training the models (e.g., specific GPU or CPU models).
Software Dependencies	No	The paper mentions software components like LSTM, Transformer, and Proximal Policy Optimization (PPO), but does not provide specific version numbers for these or other libraries (e.g., 'the motor program network is a 4 layer LSTM (Graves et al., 2013)', 'We train both our motor program policy and the baseline control policy using Proximal Policy Optimization (Schulman et al., 2017)').
Experiment Setup	Yes	In particular, the motor program network is a 4 layer LSTM (Graves et al., 2013) that takes a single 64 dimensional latent variable z as input, and predicts a sequence of 16 dimensional states. For our abstraction network, we adopt the Transformer (Vaswani et al., 2017) architecture to take in a varying length 16 dimensional continuous joint angle trajectory τ as input, and predict a variable number of latent variables {z}, that correspond to the sequence of motor programs {M} executed during trajectory τ. For Baxter-Reaching task, the goal is to get the right-hand s end-effector to a pre-deﬁned goal (x,y,z) state. The reward is a sparse reward with epsilon=0.05m; So if the end-effector reaches within 5 cm of the goal, it gets a reward of +1, otherwise it gets a reward of 0. Each z expands into a 10 length trajectory according the motor program network. To reach each of these trajectory states, a PD velocity controller is used for 5 time-steps. The baseline control policy directly outputs the velocity control action. We note that our PPO baseline implements the same action for 10 timesteps, in a manner similar to frame-skipping, as is common in RL.