reproducibilityindex.ai

Unsupervised Learning of Visual 3D Keypoints for Control

Authors: Boyuan Chen, Pieter Abbeel, Deepak Pathak

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Keypoint3D across a variety of reinforcement learning benchmark environments, and we perform the following analyses. We first investigate how well our Keypoint3D representations perform compared to other representations for RL. Second, we test the scalability to higher dimensional control problems. Third, we show that our Keypoint3D based policy is capable of manipulating deformable objects as evident from results on a task where a robot must put a scarf around a human mannequin. Finally, we show that our Keypoint3D representations generalize across tasks as well. Our method outperforms prior state-of-the-art across almost all the environments and our ablation study demonstrates its robustness across several design choices.
Researcher Affiliation	Academia	1UC Berkelely 2Carnegie Mellon University. Correspondence to: Deepak Pathak <dpathak@cs.cmu.edu>.
Pseudocode	Yes	Algorithm 1 Keypoint3D: RL with 3D Keypoint Bottleneck
Open Source Code	Yes	Code and videos at https://buoyancy99.github. io/unsup-3d-keypoints/.
Open Datasets	Yes	We choose a set of 3D manipulation environments (Yu et al., 2019), a high-dof 3D locomotion environment (Coumans & Bai, 2016 2019), a customized soft-body environment and a meta-learning benchmark (Yu et al., 2019) to evaluate our method from different perspectives. These environments are originally developed for state based RL and are hard tasks for pixel based RL.
Dataset Splits	Yes	We followed the train/test split of the benchmark, ﬁrst pre-train our method and baselines on 45 training environments featuring distinct objects for 10M steps. We then conduct transfer learning on 5 unseen test environments with the pre-trained weights for 2M steps.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions that baselines are implemented on top of PPO, but it does not specify version numbers for PPO or any other software libraries or dependencies.
Experiment Setup	Yes	As locomotion environments require temporal reasoning, we use a frame stack of 2.