Unsupervised Homography Estimation on Multimodal Image Pair via Alternating Optimization

Authors: Sanghyeob Song, Jaihyun Lew, Hyemi Jang, Sungroh Yoon

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results are shown in Table 1. When image pairs are unaligned, or if the identity matrix is used as the homography, MACE typically exceeds 23 px. Thus, any method yielding MACE above this threshold can be considered unsuccessful in training.
Researcher Affiliation Collaboration Sanghyeob Song1,3 Jaihyun Lew1 Hyemi Jang2 Sungroh Yoon1,2 1Interdisciplinary Program in Artificial Intelligence, Seoul National University 2Department of Electrical and Computer Engineering, Seoul National University 3Samsung Electro-Mechanics {songsang7, fudojhl, wkdal9512, sryoon}@snu.ac.kr
Pseudocode No The paper does not contain any sections or figures explicitly labeled 'Pseudocode' or 'Algorithm'.
Open Source Code Yes The source code can be found at: https://github.com/songsang7/Alt O
Open Datasets Yes Google Map is a multimodal dataset proposed in DLKFM [11]. It consists of pairs of satellite images and corresponding maps, which have different representation styles. There are approximately 9k training pairs and 1k test pairs of size 128 128. Google Earth is another DLKFM dataset that provides multimodality by consisting of images of the same area taken in different seasons. The amount of data is about 9k for training and 1k for test. The input image size is also 128 128. Deep NIR is a dataset proposed in [31].
Dataset Splits No The paper mentions 'training pairs' and 'test pairs' for its datasets but does not explicitly state the use or size of a separate 'validation' split.
Hardware Specification Yes Experimental settings of ours include using the Py Torch 1.13 library and an Nvidia RTX 8000 GPU with 48GB of VRAM for training each model.
Software Dependencies Yes Experimental settings of ours include using the Py Torch 1.13 library and an Nvidia RTX 8000 GPU with 48GB of VRAM for training each model.
Experiment Setup Yes The models were optimized using the Adam W optimizer [34], a one-cycle learning rate schedule [35], a maximum learning rate of 3e-4, and a weight decay of 1e-5. Additionally, gradient clipping of 1.0 is applied during the backward pass for the Geometry loss. The training protocol repeats for a total of 200 epochs with a batch size set to 16. Regarding the loss parameters, λ is set to 0.005, consistent with standard settings of Barlow Twins [12].