reproducibilityindex.ai

CATs: Cost Aggregation Transformers for Visual Correspondence

Authors: Seokju Cho, Sunghwan Hong, Sangryul Jeon, Yunsung Lee, Kwanghoon Sohn, Seungryong Kim

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments to demonstrate the effectiveness of the proposed model over the latest methods and provide extensive ablation studies. Code and trained models are available at https://sunghwanhong.github.io/CATs/.
Researcher Affiliation	Academia	Seokju Cho Yonsei University Sunghwan Hong Korea University Sangryul Jeon Yonsei University Yunsung Lee Korea University Kwanghoon Sohn Yonsei University Seungryong Kim Korea University
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code and trained models are available at https://sunghwanhong.github.io/CATs/.
Open Datasets	Yes	SPair-71k [38] provides total 70,958 image pairs... we also consider PF-PASCAL [12] containing 1,351 image pairs from 20 categories and PF-WILLOW [11] containing 900 image pairs from 4 categories, each dataset providing corresponding ground-truth annotations.
Dataset Splits	No	The paper states training and test splits for the datasets (e.g., 'we train our network on the training split and evaluated on the test split'), but does not explicitly provide details for a validation split.
Hardware Specification	Yes	For a fair comparison, the results are obtained using a single NVIDIA Ge Force RTX 2080 Ti GPU and Intel Core i7-10700 CPU.
Software Dependencies	No	The paper states 'We implemented our network using Py Torch [40]', but does not provide specific version numbers for PyTorch or other software dependencies.
Experiment Setup	Yes	For the hyper-parameters for Transformer encoder, we set the depth as 1 and the number of heads as 6. We resize the spatial size of the input image pairs to 256 256 and a sequence of selected features are resized to 16 16. We use a learnable positional embedding [10], instead of ﬁxed [61]. We implemented our network using Py Torch [40], and Adam W [33] optimizer with an initial learning rate of 3e 5 for the CATs layers and 3e 6 for the backbone features are used, which we gradually decrease during training.