3D-Aware Hypothesis & Verification for Generalizable Relative Object Pose Estimation
Authors: Chen Zhao, Tong Zhang, Mathieu Salzmann
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our comprehensive experiments on the Objaverse, LINEMOD, and CO3D datasets evidence the superior accuracy of our approach in relative pose estimation and its robustness in large-scale pose variations, when dealing with unseen objects. |
| Researcher Affiliation | Collaboration | Chen Zhao EPFL-CVLab chen.zhao@epfl.ch; Tong Zhang EPFL-IVRL tong.zhang@epfl.ch; Mathieu Salzmann EPFL-CVLab, Clear Space SA mathieu.salzmann@epfl.ch |
| Pseudocode | No | The paper describes its methodology using textual descriptions, mathematical formulations, and diagrams (e.g., Figure 2 for the overview of the framework), but it does not include a formal pseudocode block or algorithm listing. |
| Open Source Code | No | The abstract states: "Our project website is at: https://sailor-z.github.io/projects/ICLR2024_3DAHV.html." This is a project overview page, not a direct link to a source-code repository. |
| Open Datasets | Yes | Our comprehensive experiments on the Objaverse, LINEMOD, and CO3D datasets evidence the superior accuracy of our approach in relative pose estimation and its robustness in large-scale pose variations, when dealing with unseen objects. ... To this end, we utilize the Objaverse (Deitke et al., 2023) and LINEMOD (Hinterstoisser et al., 2012) datasets, which include synthetic and real data, respectively. ... We first perform an evaluation using the benchmark defined in (Lin et al., 2023), where the experiments are conducted on the CO3D (Reizenstein et al., 2021) dataset. ... The synthetic images are generated by rendering objects of Objaverse from randomly sampled viewpoints (Liu et al., 2023). We attach these images to random backgrounds which are sampled from COCO (Lin et al., 2014). |
| Dataset Splits | No | The paper mentions training and testing data and describes object-level splits (e.g., "reserving the remaining objects for training"). It specifies the number of testing image pairs. However, it does not explicitly provide percentages or counts for a distinct validation dataset split separate from training and testing. |
| Hardware Specification | Yes | Training takes around 4 days on 4 NVIDIA Tesla V100s. |
| Software Dependencies | No | The paper mentions the use of the Adam W optimizer and various architectural components like transformers and layer normalization, citing the papers where they were introduced. However, it does not specify version numbers for any programming languages, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow, CUDA versions). |
| Experiment Setup | Yes | We set the number of hypotheses during training and testing to M = 9, 000 and M = 50, 000, respectively. We define the masking threshold h = 0.25 and the geodesic distance threshold λ = 15 (Zhang et al., 2022; Lin et al., 2023). We train our network for 25 epochs using the Adam W (Loshchilov & Hutter, 2017) optimizer with a batch size of 48 and a learning rate of 10 4, which is divided by 10 after 20 epochs. |