Learning Robot Skills with Temporal Variational Inference
Authors: Tanmay Shankar, Abhinav Gupta
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the ability of our framework to learn such options across three robotic demonstration datasets, and provide our code 1. We evaluate our approach s ability to learn options across three datasets, and demonstrate that our approach can learn a meaningful space of options that correspond with traditional skills in manipulation, visualized at https://sites.google.com/view/learning-causalskills/home. We quantify the effectiveness of our policies in solving downstream tasks, evaluated on a suite of tasks. |
| Researcher Affiliation | Industry | 1Facebook AI Research, Pittsburgh, PA, USA. |
| Pseudocode | Yes | Algorithm 1 Trajectory Generation Process with Options; Algorithm 2 Temporal Variational Inference for Learning Skills |
| Open Source Code | Yes | We demonstrate the ability of our framework to learn such options across three robotic demonstration datasets, and provide our code 1. 1github.com/facebookresearch/Causal Skill Learning |
| Open Datasets | Yes | MIME Dataset (Sharma et al., 2018); Roboturk Dataset (Mandlekar et al., 2018); CMU Mocap Dataset (CMU, 2002) |
| Dataset Splits | No | For each dataset, we set aside 500 randomly sampled trajectories that serve as our test set for our experiments in section 4.2. The remaining trajectories serve as the respective training sets. The paper does not explicitly mention a separate validation split. |
| Hardware Specification | No | The paper states: "The RL based approaches are trained with DDPG with the same exploration processes and hyperparameters (such as initializations of the networks, learning rates used, etc.), as noted in the supplementary." However, it does not provide any specific details about the hardware used (e.g., GPU models, CPU types) in the main paper. |
| Software Dependencies | No | The paper mentions using LSTMs and DDPG but does not specify any software names with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x, Python 3.x) for its implementation or experiments in the main text. |
| Experiment Setup | Yes | We parameterize each of the policies π and η as LSTMs (Hochreiter & Schmidhuber, 1997), with 8 layers and 128 hidden units per layer. All baseline policies are implemented as 8 layer LSTMs with 128 hidden units, for direct comparison with our policies. |