GIFT: Learning Transformation-Invariant Dense Visual Descriptors via Group CNNs
Authors: Yuan Liu, Zehong Shen, Zhixuan Lin, Sida Peng, Hujun Bao, Xiaowei Zhou
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that GIFT outperforms state-of-the-art methods on several benchmark datasets and practically improves the performance of relative pose estimation. |
| Researcher Affiliation | Collaboration | State Key Lab of CAD&CG, ZJU-Sensetime Joint Lab of 3D Vision, Zhejiang University |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Corresponding authors: {xzhou,bao}@cad.zju.edu.cn. Project page: https://zju3dv.github.io/GIFT. |
| Open Datasets | Yes | The proposed GIFT is trained on a synthetic dataset. We randomly sample images from MS-COCO [31] and warp images with reasonable homographies defined in Superpoint [11] to construct image pairs for training. [...] we further finetune GIFT on the GL3D [50] dataset |
| Dataset Splits | No | The paper mentions using well-known datasets like MS-COCO and GL3D for training and HPSequences and SUN3D for evaluation. However, it does not explicitly provide the specific percentages or counts for training, validation, and test splits used in their own experimental setup, nor does it refer to a standard split by name (e.g., 'we use the standard MS-COCO train/val/test split'). |
| Hardware Specification | Yes | Given a 480 360 image and randomly-distributed 1024 interest points in the image, the Py Torch [46] implementation of GIFT-6 costs about 65.2 ms on a desktop with an Intel i7 3.7GHz CPU and a GTX 1080 Ti GPU. |
| Software Dependencies | No | The paper mentions 'Py Torch [46]' as the implementation framework but does not specify a version number for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | The output feature dimension n0 of the vanilla CNN is 32. In both group CNNs, H defined in Eq. (2) is {r, r 1, s, s 1, rs, rs 1, r 1s, r 1s 1, e}, where e is the identity transformation. [...] The output feature dimensions nα and nβ of two group CNNs are 8 and 16 respectively, which results in a 128-dimensional descriptor after bilinear pooling. [...] The margin γ is set to 0.5 in all experiments. |