Discovering Motor Programs by Recomposing Demonstrations
Authors: Tanmay Shankar, Shubham Tulsiani, Lerrel Pinto, Abhinav Gupta
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate, both qualitatively and quantitatively, that our learned primitives capture semantically meaningful aspects of a demonstration. This allows us to compose these primitives in a hierarchical reinforcement learning setup to efficiently solve robotic manipulation tasks like reaching and pushing. |
| Researcher Affiliation | Collaboration | Tanmay Shankar Shubham Tulsiani Facebook AI Research Facebook AI Research tanmayshankar@fb.com shubhtuls@fb.com Lerrel Pinto Robotics Institute, CMU lerrelp@cs.cmu.edu Abhinav Gupta Facebook AI Research gabhinav@fb.com |
| Pseudocode | No | The paper does not include a clearly labeled pseudocode or algorithm block. |
| Open Source Code | No | The paper links to a webpage for visualizations (e.g., 'We provide a GIF version of Fig. 3 (and other visualizations) at https://sites.google.com/view/discovering-motor-programs/home.'), but does not explicitly state that the source code for the described methodology is available. |
| Open Datasets | Yes | We use the MIME dataset (Sharma et al., 2018) to train and evaluate our model. |
| Dataset Splits | Yes | We randomly sample a train set of 5900 demonstrations from all 20 tasks, with a validation set of 1600 trajectories, and a held-out test set of 850 trajectories. |
| Hardware Specification | No | The paper mentions the Baxter Robot for data collection and execution (e.g., 'collected on a real-world Baxter Robot.'), but does not specify the hardware used for training the models (e.g., specific GPU or CPU models). |
| Software Dependencies | No | The paper mentions software components like LSTM, Transformer, and Proximal Policy Optimization (PPO), but does not provide specific version numbers for these or other libraries (e.g., 'the motor program network is a 4 layer LSTM (Graves et al., 2013)', 'We train both our motor program policy and the baseline control policy using Proximal Policy Optimization (Schulman et al., 2017)'). |
| Experiment Setup | Yes | In particular, the motor program network is a 4 layer LSTM (Graves et al., 2013) that takes a single 64 dimensional latent variable z as input, and predicts a sequence of 16 dimensional states. For our abstraction network, we adopt the Transformer (Vaswani et al., 2017) architecture to take in a varying length 16 dimensional continuous joint angle trajectory τ as input, and predict a variable number of latent variables {z}, that correspond to the sequence of motor programs {M} executed during trajectory τ. For Baxter-Reaching task, the goal is to get the right-hand s end-effector to a pre-defined goal (x,y,z) state. The reward is a sparse reward with epsilon=0.05m; So if the end-effector reaches within 5 cm of the goal, it gets a reward of +1, otherwise it gets a reward of 0. Each z expands into a 10 length trajectory according the motor program network. To reach each of these trajectory states, a PD velocity controller is used for 5 time-steps. The baseline control policy directly outputs the velocity control action. We note that our PPO baseline implements the same action for 10 timesteps, in a manner similar to frame-skipping, as is common in RL. |