Learning Dense Flow Field for Highly-accurate Cross-view Camera Localization
Authors: Zhenbo Song, ze xianghui, Jianfeng Lu, Yujiao Shi
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate significant improvements compared to state-of-the-art methods. Notably, our approach reduces the median localization error by 89%, 19%, 80%, and 35% on the KITTI, Ford multi-AV, VIGOR, and Oxford Robot Car datasets, respectively. (Abstract)In this section, we will begin by introducing the dataset and evaluation metrics employed. After that, we conduct a comparison between our proposed methodology and state-of-the-art approaches. Finally, we will present a comprehensive ablative study. (Section 4) |
| Researcher Affiliation | Academia | Zhenbo Song1 , Xianghui Ze1 , Jianfeng Lu1, Yujiao Shi2 1Nanjing University of Science and Technology, 2Shanghai Tech University |
| Pseudocode | No | The paper describes its method in prose and with a system diagram (Figure 2) but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing open-source code or a link to a code repository for their proposed method. |
| Open Datasets | Yes | We conducted tests on the KITTI[3, 13], Ford multi-AV[1, 13], VIGOR[5, 24], and Oxford Robot Car[8, 9, 22] datasets. The datasets used in this paper are obtained under academic licenses and are not original datasets specifically created for this work. A more detailed description of each dataset will be provided in the appendix. |
| Dataset Splits | Yes | The dataset is divided into Training, Test1, and Test2 subsets. The images in Test1 are from the same area as the images in the training set, while the images in Test2 are from different areas. (Appendix A, KITTI) Ground images captured on one date were used for training, while images captured on another date were used for testing. (Appendix A, Ford Multi AV) The dataset includes two evaluation splits: same-area and cross-area, based on whether the images in training and testing sets are from the same region. (Appendix A, VIGOR) |
| Hardware Specification | Yes | We conducted 25 rounds of training on two TITAN V GPUs with a batch size of 6. |
| Software Dependencies | No | The paper mentions various software components and architectures like RAFT, ResNet18, and Adam optimizer, but it does not provide specific version numbers for any of these components or the underlying deep learning framework used. |
| Experiment Setup | Yes | We employed Res Net18 to extract features from both ground and satellite images, respectively, ensuring that the resolution was maintained at 1/8th of the original with 256 channels. Subsequently, in the dense flow estimation stage, we performed 12 iterations to obtain the final dense matching relationship. During the training period, we utilized the Adam optimizer[7] for end-to-end training, with a learning rate of 2 10 5, β1 = 0.9, and β2 = 0.999. The entire training schedule consisted of 25 epochs. At the 15th epoch, we adjusted the parameter κ in the loss function from 200 to 20 and β from 1 to 10. We conducted 25 rounds of training on two TITAN V GPUs with a batch size of 6. |