Unsupervised Learning of Visual 3D Keypoints for Control

Authors: Boyuan Chen, Pieter Abbeel, Deepak Pathak

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate Keypoint3D across a variety of reinforcement learning benchmark environments, and we perform the following analyses. We first investigate how well our Keypoint3D representations perform compared to other representations for RL. Second, we test the scalability to higher dimensional control problems. Third, we show that our Keypoint3D based policy is capable of manipulating deformable objects as evident from results on a task where a robot must put a scarf around a human mannequin. Finally, we show that our Keypoint3D representations generalize across tasks as well. Our method outperforms prior state-of-the-art across almost all the environments and our ablation study demonstrates its robustness across several design choices.
Researcher Affiliation Academia 1UC Berkelely 2Carnegie Mellon University. Correspondence to: Deepak Pathak <dpathak@cs.cmu.edu>.
Pseudocode Yes Algorithm 1 Keypoint3D: RL with 3D Keypoint Bottleneck
Open Source Code Yes Code and videos at https://buoyancy99.github. io/unsup-3d-keypoints/.
Open Datasets Yes We choose a set of 3D manipulation environments (Yu et al., 2019), a high-dof 3D locomotion environment (Coumans & Bai, 2016 2019), a customized soft-body environment and a meta-learning benchmark (Yu et al., 2019) to evaluate our method from different perspectives. These environments are originally developed for state based RL and are hard tasks for pixel based RL.
Dataset Splits Yes We followed the train/test split of the benchmark, first pre-train our method and baselines on 45 training environments featuring distinct objects for 10M steps. We then conduct transfer learning on 5 unseen test environments with the pre-trained weights for 2M steps.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions that baselines are implemented on top of PPO, but it does not specify version numbers for PPO or any other software libraries or dependencies.
Experiment Setup Yes As locomotion environments require temporal reasoning, we use a frame stack of 2.