Visual Correspondence Hallucination
Authors: Hugo Germain, Vincent Lepetit, Guillaume Bourmaud
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally demonstrate that this network is indeed able to hallucinate correspondences on pairs of images captured in scenes that were not seen at training-time. We also apply this network to an absolute camera pose estimation problem and find it is significantly more robust than state-of-the-art local feature matching-based competitors. 4 EXPERIMENTS In these experiments, we seek to answer two questions: 1) “Is the proposed Neur Hal approach presented in Sec. 3 capable of hallucinating correspondences?” and 2) “In the context of absolute camera pose estimation, does the ability to hallucinate correspondences bring further robustness?” |
| Researcher Affiliation | Academia | Hugo Germain1, Vincent Lepetit1 and Guillaume Bourmaud2 1LIGM, École des Ponts, Univ Gustave Eiffel, CNRS, Marne-la-Vallée, France 2IMS, University of Bordeaux, Bordeaux INP, CNRS, Bordeaux, France |
| Pseudocode | No | The paper provides an architectural diagram (Figure 2) but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We provide the Neur Hal model architecture and weights in the supplementary material. We also release a simple evaluation script that generates qualitative results, and show in a notebook the results obtained on an image pair captured indoors using a smartphone. |
| Open Datasets | Yes | We evaluate the ability of our network to hallucinate correspondences on four datasets: the indoor datasets Scan Net (Dai et al., 2017) and NYU (Nathan Silberman & Fergus, 2012), and the outdoor datasets Mega Depth (Li & Snavely, 2018) and ETH-3D (Schöps et al., 2017). |
| Dataset Splits | Yes | For the indoor setting (outdoor setting, respectively), we train Neur Hal on Scan Net (Megadepth, respectively) on the training scenes as described in Sec. 3.4, and evaluate it on the disjoint set of validation scenes. For testing images, we sample 2, 500 image pairs with overlaps between 2% and 80% from the Scan Net testing scenes, using several bins to ensure the sampling is close to being uniform. |
| Hardware Specification | Yes | For an indoor sample with 2000 keypoints it has an average throughput of 8.84 image/s on an NVIDIA RTX 3070 GPU. We apply the linear scaling rule and use a batch size of 8 over 8 NVIDIA V100 GPUs. |
| Software Dependencies | No | The model is implemented in Py Torch (Paszke et al., 2017). No specific version numbers for PyTorch or other libraries are provided. |
| Experiment Setup | Yes | We use an initial learning rate of 10 3, with a linear learning rate warm-up in 3 epochs from 0.1 of the initial learning rate. As Sun et al. (2021), we decay the learning rate by 0.5 every 8 epochs starting from the 8th epoch. We apply the linear scaling rule and use a batch size of 8 over 8 NVIDIA V100 GPUs. We use the Adam W (Loshchilov & Hutter, 2019) optimizer, with a weight decay of 0.1. |