reproducibilityindex.ai

OnePose++: Keypoint-Free One-Shot Object Pose Estimation without CAD Models

Authors: Xingyi He, Jiaming Sun, Yuang Wang, Di Huang, Hujun Bao, Xiaowei Zhou

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that the proposed pipeline outperforms existing one-shot CAD-model-free methods by a large margin and is comparable to CAD-model-based methods on LINEMOD even for low-textured objects. We evaluate our framework on the One Pose [48] dataset and the LINEMOD [16] dataset. The experiments show that our method outperforms all existing one-shot pose estimation methods [48, 33] by a large margin and even achieves comparable results with instance-level methods [39, 29] which are trained for each object instance with a CAD model.
Researcher Affiliation	Collaboration	1Zhejiang University 2Image Derivative Inc. 3The University of Sydney
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks. Method details are provided through descriptive text, equations, and figures.
Open Source Code	Yes	The supplementary material, code and dataset are available on the project page: https://zju3dv.github.io/onepose_plus_plus/.
Open Datasets	Yes	We validate our method on the One Pose [48] and LINEMOD [16] datasets. The One Pose dataset is newly proposed, which contains around 450 real-world video sequences of 150 objects. We also collect a new dataset named One Pose-Low Texture, which comprises 80 sequences of 40 low-textured objects. The supplementary material, code and dataset are available on the project page: https://zju3dv.github.io/onepose_plus_plus/. The One Pose [48] dataset and the LINEMOD [? ] dataset used in the paper are public; our code and the collected One Pose-Low Texturedataset will be published.
Dataset Splits	Yes	For both datasets, we follow the train-test split in previous methods [48, 29].
Hardware Specification	Yes	The network training takes about 20 hours with a batch size of 32 on 8 NVIDIA-V100 GPUs.
Software Dependencies	No	The paper mentions several tools and frameworks (e.g., COLMAP [44], Deep LM [18], Lo FTR [47], Res Net-18 [15], YOLOv5 [1], Adam W optimizer) but does not provide specific version numbers for these software dependencies, which is required for reproducibility.
Experiment Setup	Yes	We use Res Net-18 [15] as the image backbone and set Nc = 3, Nf = 1 for the 2D-3D attention module. The scale factor τ is 0.08, the cropped window size w in the fine level is 5, and the confidence threshold θ is set to 0.4. The entire model is trained on the One Pose training set, and we randomly sample or pad the reconstructed point cloud to 7000 points for training. We use the Adam W optimizer with an initial learning rate of 4 × 10−3.