LoCo: Learning 3D Location-Consistent Image Features with a Memory-Efficient Ranking Loss

Authors: Dominik Kloepfer, João F. Henriques, Dylan Campbell

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We showcase the improved location consistency of our trained feature extractor directly on a multi-view consistency task, as well as the downstream task of scene-stable panoptic segmentation, significantly outperforming previous state-of-the-art.
Researcher Affiliation Academia Dominik A. Kloepfer Visual Geometry Group University of Oxford dominik@robots.ox.ac.uk João Henriques Visual Geometry Group University of Oxford joao@robots.ox.ac.uk Dylan Campbell School of Computing Australian National University dylan.campbell@anu.edu.au
Pseudocode No The paper describes methods and architectures but does not include explicit pseudocode or algorithm blocks.
Open Source Code No We provide further implementation details regarding the efficient sampling of patch pairs in Appendix D, and will publicly release our training code.
Open Datasets Yes The training dataset we use comprises 59 environments of the Matterport3D dataset, resizing the images to 256 320 pixels. The Matterport3D dataset is particularly suitable for our task of enforcing multi-view consistency due to its diversity and the way it captures varied viewpoints of the same scene through panorama cropping. ... Like El Banani et al., we evaluate on the Paired Scan Net [10] split proposed by Sarlin et al. [36].
Dataset Splits No The paper discusses training and testing datasets, but does not explicitly provide details about a validation dataset split.
Hardware Specification Yes In contrast, Cro Co trains the entire network (85 million parameters) with a multi-view loss and on significantly larger datasets with greater computational resources (8 A100 GPUs vs. 1 RTX8000 GPU).
Software Dependencies No The paper mentions the FAISS library but does not provide specific version numbers for software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes We use values of ρ = 0.5m for the positive radius, κ = 5.0m for the negative radius, τ = 0.01 for the sigmoid temperature, and = 0.076 for the saturation threshold. ... Instead, we adapt the architecture used by DINO-Tracker [41], keeping pre-trained DINO [5] features frozen and training a convolutional neural network to learn additive residuals to those features. Table 3: Hyperparameters of the convolutional layers of the residual network used for the pixel-correspondence task in Section 4.3.