MTLDesc: Looking Wider to Describe Better
Authors: Changwei Wang, Rongtao Xu, Yuyang Zhang, Shibiao Xu, Weiliang Meng, Bin Fan, Xiaopeng Zhang2388-2396
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | With the above innovations, the performance of our MTLDesc significantly surpasses the prior state-of-the-art local descriptors on HPatches, Aachen Day-Night localization and In Loc indoor localization benchmarks. Our code is available at https://github.com/vignywang/MTLDesc.Experiments Comparisons on Image Matching Dataset and Metrics: We use the popular HPatches benchmark (Balntas et al. 2017) for ablation studies and comparisons. |
| Researcher Affiliation | Academia | 1NLPR, Institute of Automation, Chinese Academy of Sciences 2School of Artificial Intelligence, Beijing University of Posts and Telecommunications 3Zhejiang Lab; 4School of Artificial Intelligence, University of Chinese Academy of Sciences 5School of Automation and Electrical Engineering, University of Science and Technology Beijing {wangchangwei2019, xurongtao2019, yuyang.zhang, weiliang.meng, xiaopeng.zhang}@ia.ac.cn, shibiaoxu@bupt.edu.cn, bin.fan@ieee.org |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. The methodology is described in text and diagrams. |
| Open Source Code | Yes | Our code is available at https://github.com/vignywang/MTLDesc. |
| Open Datasets | Yes | Data Preparation: We use Mega Depth (Li and Snavely 2018) to generate the training data with dense pixel-wise correspondences. We use the popular HPatches benchmark (Balntas et al. 2017) for ablation studies and comparisons. We resort to Aachen Day-Night v1.1 (Sattler et al. 2012) and In Loc indoor visual localization (Taira et al. 2018) benchmarks to further demonstrate the effectiveness of our MTLDesc. |
| Dataset Splits | Yes | Data Preparation: We use Mega Depth (Li and Snavely 2018) to generate the training data with dense pixel-wise correspondences. Mega Depth dataset contains image pairs with known pose and depth information from 196 different scenes. Following the settings in D2-Net, we take 118 scenes from all scenes as the training set. We randomly select 100 image pairs from each scene, and intercept 400 400 image blocks from the original images for the training. Thus, we get 11, 800 image pairs with dense pixel correspondence. ... In summary, our training data consists of 23, 600 image pairs in total. We use the popular HPatches benchmark (Balntas et al. 2017) for ablation studies and comparisons. Following previous methods, we use 108 sequences with viewpoint or illumination variations after excluding high-resolution sequences from 116 available sequences. We resort to Aachen Day-Night v1.1 (Sattler et al. 2012) and In Loc indoor visual localization (Taira et al. 2018) benchmarks to further demonstrate the effectiveness of our MTLDesc. |
| Hardware Specification | Yes | The whole training process typically converges in 30 epochs and takes about 14 hours with a single NVIDIA Titan V GPU. |
| Software Dependencies | No | The paper mentions software used but does not provide version numbers. "Our method implemented by Pytorch1 runs at 24 FPS (real time) on 480 640 images with a single NVIDIA Titan V GPU. 1We also implemented our method by using Mindspore (https://www.mindspore.cn/) and observed similar performance." No specific version numbers for Pytorch or Mindspore are provided. |
| Experiment Setup | Yes | The Adam optimizer with poly learning rate policy is used to optimize the network, and the learning rate decays from 0.001. The training image size is set to 400 400 with the training batch size 12. |