U-Match: Two-view Correspondence Learning with Hierarchy-aware Local Context Aggregation

Authors: Zizhuo Li, Shihua Zhang, Jiayi Ma

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on different visual tasks prove that our method significantly surpasses the stateof-the-arts.
Researcher Affiliation Academia Electronic Information School, Wuhan University, Wuhan 430072, China
Pseudocode No The paper provides network architecture diagrams and mathematical formulations but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is publicly available at https://github.com/Zizhuo Li/U-Match.
Open Datasets Yes Datasets. As in the previous work [Zhang et al., 2019], we resort to two popular datasets, YFCC100M [Thomee et al., 2016] and SUN3D [Xiao et al., 2013], to demonstrate the correspondence learning ability of our method in outdoor and indoor scenes, respectively.
Dataset Splits Yes YFCC100M contains 100 million images from Internet, which are split into 72 sequences according to different tourist spots. We choose 68 sequences as training and validation data, and the remaining sequences are used for testing. SUN3D is a large-scale RGB-D video dataset with relative camera motions retrieved by generalized bundle adjustment. It is comprised of 254 indoor image sequences with poor texture, repetitive elements, and selfocclusions, where 239 sequences are adopted for training and validation, and the rest sequences are used for testing.
Hardware Specification Yes All experiments are conducted on Ubuntu 18.04 with Ge Force RTX 3090 GPUs.
Software Dependencies No The paper mentions implementing the model with Pytorch and using the Adam optimizer but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes In our implementation, the input to our model is N 4 putative correspondences established by an NN matcher with SIFT detector-descriptors, typically N = 2000, unless otherwise specified. The number of levels is set to L = 4, i.e., each HRGA module contains three LCPool layers with the sampling ratios of 0.125, 0.5, 0.5, respectively. We use 4-head attention in the context aggregation layer. We implement our model with Pytorch and adopt Adam optimizer with a learning rate of 10 4 and a batch size of 32 in optimization. Weight α is set to 0 at the start and to 0.5 after first 20k iterations.