Learning Robot Skills with Temporal Variational Inference

Authors: Tanmay Shankar, Abhinav Gupta

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the ability of our framework to learn such options across three robotic demonstration datasets, and provide our code 1. We evaluate our approach s ability to learn options across three datasets, and demonstrate that our approach can learn a meaningful space of options that correspond with traditional skills in manipulation, visualized at https://sites.google.com/view/learning-causalskills/home. We quantify the effectiveness of our policies in solving downstream tasks, evaluated on a suite of tasks.
Researcher Affiliation Industry 1Facebook AI Research, Pittsburgh, PA, USA.
Pseudocode Yes Algorithm 1 Trajectory Generation Process with Options; Algorithm 2 Temporal Variational Inference for Learning Skills
Open Source Code Yes We demonstrate the ability of our framework to learn such options across three robotic demonstration datasets, and provide our code 1. 1github.com/facebookresearch/Causal Skill Learning
Open Datasets Yes MIME Dataset (Sharma et al., 2018); Roboturk Dataset (Mandlekar et al., 2018); CMU Mocap Dataset (CMU, 2002)
Dataset Splits No For each dataset, we set aside 500 randomly sampled trajectories that serve as our test set for our experiments in section 4.2. The remaining trajectories serve as the respective training sets. The paper does not explicitly mention a separate validation split.
Hardware Specification No The paper states: "The RL based approaches are trained with DDPG with the same exploration processes and hyperparameters (such as initializations of the networks, learning rates used, etc.), as noted in the supplementary." However, it does not provide any specific details about the hardware used (e.g., GPU models, CPU types) in the main paper.
Software Dependencies No The paper mentions using LSTMs and DDPG but does not specify any software names with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x, Python 3.x) for its implementation or experiments in the main text.
Experiment Setup Yes We parameterize each of the policies π and η as LSTMs (Hochreiter & Schmidhuber, 1997), with 8 layers and 128 hidden units per layer. All baseline policies are implemented as 8 layer LSTMs with 128 hidden units, for direct comparison with our policies.