SD-Pose: Semantic Decomposition for Cross-Domain 6D Object Pose Estimation
Authors: Zhigang Li, Yinlin Hu, Mathieu Salzmann, Xiangyang Ji2020-2028
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our comprehensive analyses and experiments show that our semantic decomposition strategy can fully utilize the different domain similarities of different representations, thus allowing us to outperform the state of the art on modern 6D object pose datasets without accessing any real data during training. |
| Researcher Affiliation | Collaboration | Zhigang Li,1 Yinlin Hu,2 Mathieu Salzmann,2,3 and Xiangyang Ji 1* 1 Tsinghua University 2 EPFL 3 Clear Space SA |
| Pseudocode | No | The paper describes its methodology in text and uses diagrams, but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access to source code, nor does it explicitly state that its code is open-source or available. |
| Open Datasets | Yes | We conduct our experiments on both the Line MOD and Occluded-Line MOD dataset. Line MOD (Hinterstoisser et al. 2012) is the de facto standard benchmark for 6D pose estimation of textureless objects in cluttered scenes. ... During training, for all synthetic images, the background is randomly replaced with indoor images from the PASCAL VOC2012 dataset. |
| Dataset Splits | No | The paper states how the training and test sets are used (e.g., 'the test set consists of all occluded images' and 'only use synthetic data for training') and mentions following previous works for splitting, but it does not explicitly specify a distinct validation set split or its size/percentage. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory amounts) used for running its experiments, only general mentions of training models. |
| Software Dependencies | No | The paper mentions using Mask RCNN and an Open GL-based renderer, but it does not specify version numbers for any software components, libraries, or solvers used in the experiments. |
| Experiment Setup | Yes | Concretely, during rendering, the rotation is uniformly sampled from the angle range of the training set, and the translation is randomly generated according to the mean and variance calculated from the training set. ... We set hin = win = 256. ... We set hout = wout = nbin = 64. ... we use the masked cross-entropy loss for the coordinate maps, which only calculates the crossentropy on the object foreground region. For the confidence maps, we compute the binary cross-entropy on the whole region. |