Unsupervised Learning of Visual 3D Keypoints for Control
Authors: Boyuan Chen, Pieter Abbeel, Deepak Pathak
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Keypoint3D across a variety of reinforcement learning benchmark environments, and we perform the following analyses. We first investigate how well our Keypoint3D representations perform compared to other representations for RL. Second, we test the scalability to higher dimensional control problems. Third, we show that our Keypoint3D based policy is capable of manipulating deformable objects as evident from results on a task where a robot must put a scarf around a human mannequin. Finally, we show that our Keypoint3D representations generalize across tasks as well. Our method outperforms prior state-of-the-art across almost all the environments and our ablation study demonstrates its robustness across several design choices. |
| Researcher Affiliation | Academia | 1UC Berkelely 2Carnegie Mellon University. Correspondence to: Deepak Pathak <dpathak@cs.cmu.edu>. |
| Pseudocode | Yes | Algorithm 1 Keypoint3D: RL with 3D Keypoint Bottleneck |
| Open Source Code | Yes | Code and videos at https://buoyancy99.github. io/unsup-3d-keypoints/. |
| Open Datasets | Yes | We choose a set of 3D manipulation environments (Yu et al., 2019), a high-dof 3D locomotion environment (Coumans & Bai, 2016 2019), a customized soft-body environment and a meta-learning benchmark (Yu et al., 2019) to evaluate our method from different perspectives. These environments are originally developed for state based RL and are hard tasks for pixel based RL. |
| Dataset Splits | Yes | We followed the train/test split of the benchmark, first pre-train our method and baselines on 45 training environments featuring distinct objects for 10M steps. We then conduct transfer learning on 5 unseen test environments with the pre-trained weights for 2M steps. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions that baselines are implemented on top of PPO, but it does not specify version numbers for PPO or any other software libraries or dependencies. |
| Experiment Setup | Yes | As locomotion environments require temporal reasoning, we use a frame stack of 2. |