OnePose++: Keypoint-Free One-Shot Object Pose Estimation without CAD Models

Authors: Xingyi He, Jiaming Sun, Yuang Wang, Di Huang, Hujun Bao, Xiaowei Zhou

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that the proposed pipeline outperforms existing one-shot CAD-model-free methods by a large margin and is comparable to CAD-model-based methods on LINEMOD even for low-textured objects. We evaluate our framework on the One Pose [48] dataset and the LINEMOD [16] dataset. The experiments show that our method outperforms all existing one-shot pose estimation methods [48, 33] by a large margin and even achieves comparable results with instance-level methods [39, 29] which are trained for each object instance with a CAD model.
Researcher Affiliation Collaboration 1Zhejiang University 2Image Derivative Inc. 3The University of Sydney
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. Method details are provided through descriptive text, equations, and figures.
Open Source Code Yes The supplementary material, code and dataset are available on the project page: https://zju3dv.github.io/onepose_plus_plus/.
Open Datasets Yes We validate our method on the One Pose [48] and LINEMOD [16] datasets. The One Pose dataset is newly proposed, which contains around 450 real-world video sequences of 150 objects. We also collect a new dataset named One Pose-Low Texture, which comprises 80 sequences of 40 low-textured objects. The supplementary material, code and dataset are available on the project page: https://zju3dv.github.io/onepose_plus_plus/. The One Pose [48] dataset and the LINEMOD [? ] dataset used in the paper are public; our code and the collected One Pose-Low Texturedataset will be published.
Dataset Splits Yes For both datasets, we follow the train-test split in previous methods [48, 29].
Hardware Specification Yes The network training takes about 20 hours with a batch size of 32 on 8 NVIDIA-V100 GPUs.
Software Dependencies No The paper mentions several tools and frameworks (e.g., COLMAP [44], Deep LM [18], Lo FTR [47], Res Net-18 [15], YOLOv5 [1], Adam W optimizer) but does not provide specific version numbers for these software dependencies, which is required for reproducibility.
Experiment Setup Yes We use Res Net-18 [15] as the image backbone and set Nc = 3, Nf = 1 for the 2D-3D attention module. The scale factor τ is 0.08, the cropped window size w in the fine level is 5, and the confidence threshold θ is set to 0.4. The entire model is trained on the One Pose training set, and we randomly sample or pad the reconstructed point cloud to 7000 points for training. We use the Adam W optimizer with an initial learning rate of 4 × 10−3.