LoCo: Learning 3D Location-Consistent Image Features with a Memory-Efficient Ranking Loss
Authors: Dominik Kloepfer, João F. Henriques, Dylan Campbell
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We showcase the improved location consistency of our trained feature extractor directly on a multi-view consistency task, as well as the downstream task of scene-stable panoptic segmentation, significantly outperforming previous state-of-the-art. |
| Researcher Affiliation | Academia | Dominik A. Kloepfer Visual Geometry Group University of Oxford dominik@robots.ox.ac.uk João Henriques Visual Geometry Group University of Oxford joao@robots.ox.ac.uk Dylan Campbell School of Computing Australian National University dylan.campbell@anu.edu.au |
| Pseudocode | No | The paper describes methods and architectures but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | No | We provide further implementation details regarding the efficient sampling of patch pairs in Appendix D, and will publicly release our training code. |
| Open Datasets | Yes | The training dataset we use comprises 59 environments of the Matterport3D dataset, resizing the images to 256 320 pixels. The Matterport3D dataset is particularly suitable for our task of enforcing multi-view consistency due to its diversity and the way it captures varied viewpoints of the same scene through panorama cropping. ... Like El Banani et al., we evaluate on the Paired Scan Net [10] split proposed by Sarlin et al. [36]. |
| Dataset Splits | No | The paper discusses training and testing datasets, but does not explicitly provide details about a validation dataset split. |
| Hardware Specification | Yes | In contrast, Cro Co trains the entire network (85 million parameters) with a multi-view loss and on significantly larger datasets with greater computational resources (8 A100 GPUs vs. 1 RTX8000 GPU). |
| Software Dependencies | No | The paper mentions the FAISS library but does not provide specific version numbers for software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | We use values of ρ = 0.5m for the positive radius, κ = 5.0m for the negative radius, τ = 0.01 for the sigmoid temperature, and = 0.076 for the saturation threshold. ... Instead, we adapt the architecture used by DINO-Tracker [41], keeping pre-trained DINO [5] features frozen and training a convolutional neural network to learn additive residuals to those features. Table 3: Hyperparameters of the convolutional layers of the residual network used for the pixel-correspondence task in Section 4.3. |