ParaFormer: Parallel Attention Transformer for Efficient Feature Matching
Authors: Xiaoyong Lu, Yaping Yan, Bin Kang, Songlin Du
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Sufficient experiments on various applications, including homography estimation, pose estimation, and image matching, demonstrate that Para Former achieves state-of-the-art performance while maintaining high efficiency. |
| Researcher Affiliation | Academia | 1Southeast University, Nanjing, China 2Nanjing University of Posts and Telecommunication, Nanjing, China |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement or link indicating the release of open-source code for the described methodology. |
| Open Datasets | Yes | The homography model is pretrained on the R1M dataset (Radenovi c et al. 2018), and then the model is finetuned on the Mega Depth dataset (Li and Snavely 2018) for outdoor pose estimation and image matching tasks. |
| Dataset Splits | Yes | We split R1M dataset (Radenovi c et al. 2018), which contains over a million images of Oxford and Paris, into training, validation, and testing sets. |
| Hardware Specification | Yes | All models are trained on a single NVIDIA 3070Ti GPU. |
| Software Dependencies | No | The paper mentions optimizers (Adam W) and models (Transformer, U-Net) but does not provide specific version numbers for software libraries like PyTorch or TensorFlow, or other dependencies. |
| Experiment Setup | Yes | On the R1M dataset, we employ the Adam W (Kingma and Ba 2014) optimizer for 10 epochs using the cosine decay learning rate scheduler and 1 epoch of linear warm-up. A batch size of 8 and an initial learning rate of 0.0001 are used. On the Mega Depth dataset, we use the same Adam W optimizer for 50 epochs using the same learning rate scheduler and linear warm-up. A batch size of 2 and a lower initial learning rate of 0.00001 are used. For Para Former, we stack L = 9 parallel attention layers, and all intermediate features have the same dimension C = 256. For Para Former-U, the depth of each stage is {2, 1, 2, 1, 2}, resulting in a total of L = 8 parallel attention layers, and the intermediate feature dimension of each stage is {256, 384, 128, 384, 256}. |