ParaFormer: Parallel Attention Transformer for Efficient Feature Matching

Authors: Xiaoyong Lu, Yaping Yan, Bin Kang, Songlin Du

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Sufficient experiments on various applications, including homography estimation, pose estimation, and image matching, demonstrate that Para Former achieves state-of-the-art performance while maintaining high efficiency.
Researcher Affiliation Academia 1Southeast University, Nanjing, China 2Nanjing University of Posts and Telecommunication, Nanjing, China
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement or link indicating the release of open-source code for the described methodology.
Open Datasets Yes The homography model is pretrained on the R1M dataset (Radenovi c et al. 2018), and then the model is finetuned on the Mega Depth dataset (Li and Snavely 2018) for outdoor pose estimation and image matching tasks.
Dataset Splits Yes We split R1M dataset (Radenovi c et al. 2018), which contains over a million images of Oxford and Paris, into training, validation, and testing sets.
Hardware Specification Yes All models are trained on a single NVIDIA 3070Ti GPU.
Software Dependencies No The paper mentions optimizers (Adam W) and models (Transformer, U-Net) but does not provide specific version numbers for software libraries like PyTorch or TensorFlow, or other dependencies.
Experiment Setup Yes On the R1M dataset, we employ the Adam W (Kingma and Ba 2014) optimizer for 10 epochs using the cosine decay learning rate scheduler and 1 epoch of linear warm-up. A batch size of 8 and an initial learning rate of 0.0001 are used. On the Mega Depth dataset, we use the same Adam W optimizer for 50 epochs using the same learning rate scheduler and linear warm-up. A batch size of 2 and a lower initial learning rate of 0.00001 are used. For Para Former, we stack L = 9 parallel attention layers, and all intermediate features have the same dimension C = 256. For Para Former-U, the depth of each stage is {2, 1, 2, 1, 2}, resulting in a total of L = 8 parallel attention layers, and the intermediate feature dimension of each stage is {256, 384, 128, 384, 256}.