Canonical Capsules: Self-Supervised Capsules in Canonical Pose

Authors: Weiwei Sun, Andrea Tagliasacchi, Boyang Deng, Sara Sabour, Soroosh Yazdani, Geoffrey E. Hinton, Kwang Moo Yi

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We propose a self-supervised capsule architecture for 3D point clouds. We compute capsule decompositions of objects through permutation-equivariant attention, and self-supervise the process by training with pairs of randomly rotated objects. Our key idea is to aggregate the attention masks into semantic keypoints, and use these to supervise a decomposition that satisfies the capsule invariance/equivariance properties. This not only enables the training of a semantically consistent decomposition, but also allows us to learn a canonicalization operation that enables object-centric reasoning. To train our neural network we require neither classification labels nor manually-aligned training datasets. Yet, by learning an object-centric representation in a self-supervised manner, our method outperforms the state-of-the-art on 3D point cloud reconstruction, canonicalization, and unsupervised classification. ... Section 4 is dedicated to Experiments.
Researcher Affiliation Collaboration 1University of British Columbia, 2University of Toronto, 3Google Research, 4University of Victoria, equal contributions
Pseudocode No The paper describes algorithmic steps and equations but does not include a clearly labeled pseudocode or algorithm block.
Open Source Code Yes In addition to the public code release that we will do, we have included the code in the supplementary
Open Datasets Yes To evaluate our method, we rely on the Shape Net (Core) dataset [3]2. ... [3] Angel X Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, et al. Shapenet: An Information-Rich 3D Model Repository. ar Xiv Preprint, 2015.
Dataset Splits Yes We also use the same splits as in Atlas Net V2 [12]: 31747 shapes in the train, and 7943 shapes in the test set.
Hardware Specification Yes We train each model on a single NVidia V100 GPU.
Software Dependencies No The paper mentions 'Adam optimizer [29]' but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup Yes For all our experiments we use the Adam optimizer [29] with an initial learning rate of 0.001 and decay rate of 0.1. We train for 325 epochs for the aligned setup to match the Atlas Net V2 [12] original setup. For the unaligned setting, as the problem is harder, we train for a longer number of 450 epochs. We use a batch size of 16.