Unsupervised Homography Estimation on Multimodal Image Pair via Alternating Optimization
Authors: Sanghyeob Song, Jaihyun Lew, Hyemi Jang, Sungroh Yoon
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental results are shown in Table 1. When image pairs are unaligned, or if the identity matrix is used as the homography, MACE typically exceeds 23 px. Thus, any method yielding MACE above this threshold can be considered unsuccessful in training. |
| Researcher Affiliation | Collaboration | Sanghyeob Song1,3 Jaihyun Lew1 Hyemi Jang2 Sungroh Yoon1,2 1Interdisciplinary Program in Artificial Intelligence, Seoul National University 2Department of Electrical and Computer Engineering, Seoul National University 3Samsung Electro-Mechanics {songsang7, fudojhl, wkdal9512, sryoon}@snu.ac.kr |
| Pseudocode | No | The paper does not contain any sections or figures explicitly labeled 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | The source code can be found at: https://github.com/songsang7/Alt O |
| Open Datasets | Yes | Google Map is a multimodal dataset proposed in DLKFM [11]. It consists of pairs of satellite images and corresponding maps, which have different representation styles. There are approximately 9k training pairs and 1k test pairs of size 128 128. Google Earth is another DLKFM dataset that provides multimodality by consisting of images of the same area taken in different seasons. The amount of data is about 9k for training and 1k for test. The input image size is also 128 128. Deep NIR is a dataset proposed in [31]. |
| Dataset Splits | No | The paper mentions 'training pairs' and 'test pairs' for its datasets but does not explicitly state the use or size of a separate 'validation' split. |
| Hardware Specification | Yes | Experimental settings of ours include using the Py Torch 1.13 library and an Nvidia RTX 8000 GPU with 48GB of VRAM for training each model. |
| Software Dependencies | Yes | Experimental settings of ours include using the Py Torch 1.13 library and an Nvidia RTX 8000 GPU with 48GB of VRAM for training each model. |
| Experiment Setup | Yes | The models were optimized using the Adam W optimizer [34], a one-cycle learning rate schedule [35], a maximum learning rate of 3e-4, and a weight decay of 1e-5. Additionally, gradient clipping of 1.0 is applied during the backward pass for the Geometry loss. The training protocol repeats for a total of 200 epochs with a batch size set to 16. Regarding the loss parameters, λ is set to 0.005, consistent with standard settings of Barlow Twins [12]. |