ResMatch: Residual Attention Learning for Feature Matching
Authors: Yuxin Deng, Kaining Zhang, Shihua Zhang, Yansheng Li, Jiayi Ma
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments, including feature matching, pose estimation and visual localization, confirm the superiority of the proposed method. |
| Researcher Affiliation | Academia | Yuxin Deng1, Kaining Zhang1, Shihua Zhang1, Yansheng Li2, Jiayi Ma1 1Electronic Information School, Wuhan University, Wuhan 430072, China 2School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China {dyx acuo, zkn19961212, yansheng.li}@whu.edu.com, {suhzhang001, jyma2010}@gmail.com |
| Pseudocode | No | The paper describes formulas and operations but does not include structured pseudocode or algorithm blocks (e.g., labeled 'Algorithm' or formatted as distinct steps). |
| Open Source Code | Yes | Our codes are available at https://github.com/ACu Oo Oo O/Res Match. |
| Open Datasets | Yes | Following SGMNet (Chen et al. 2021), we train the network on the GL3D (Shen et al. 2018). We evaluate our methods for two-view image matching on three datasets, including YFCC100M (Thomee et al. 2016), Scan Net (Dai et al. 2017), and FM-Bench (Bian et al. 2019). |
| Dataset Splits | No | The paper mentions training on GL3D and evaluating on other datasets (YFCC100M, Scan Net, FM-Bench), but it does not provide specific details on training, validation, and test splits for the GL3D dataset or for how validation was performed during training (e.g., percentages, sample counts, or explicit standard split citations for the training process). |
| Hardware Specification | Yes | Computation efficiency of feature matching networks on an NVIDIA RTX3090 GPU. |
| Software Dependencies | No | The paper mentions using a 'Sinkhorn algorithm' and integrating with 'Hierarchical Localization (HLoc)', but it does not specify any software names with version numbers (e.g., Python, PyTorch, specific library versions) that are necessary to reproduce the experiment. |
| Experiment Setup | Yes | Our networks consist of sequential 9 blocks of 4-head hybrid attention, of which feature dimension is consistent with the input descriptor. In the training, 1k features are extracted for each image. 10 iterations of Sinkhorn algorithm are performed to obtain the assignment matrix. And cross-entropy loss same as Super Glue (Sarlin et al. 2020) is conducted on the final matching probability. The networks are trained in 450000 iterations with batch size of 16. k is empirically set to a constant 64. |