Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Alligat0R: Pre-Training through Covisibility Segmentation for Relative Camera Pose Regression
Authors: Thibaut Loiseau, Guillaume Bourmaud, Vincent Lepetit
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experiments show that our novel pre-training method Alligat0R significantly outperforms Cro Co in relative pose regression. Our experiments demonstrate that explicitly learning to understand covisibility relationships between image pairs leads to more robust and transferable features for relative pose regression compared to reconstruction-based approaches. 5 Experiments |
| Researcher Affiliation | Academia | 1 LIGM, Ecole des Ponts, Univ. Gustave Eiffel, CNRS, France 2 Univ. Bordeaux, CNRS, Bordeaux INP, IMS, UMR 5218, France |
| Pseudocode | No | The paper describes the architecture and methodology in sections 3.1, 3.2, and 3.3 using descriptive text and mathematical equations, and Figure 2 provides an architecture overview. There are no explicitly labeled "Pseudocode" or "Algorithm" blocks. |
| Open Source Code | Yes | Alligat0R and Cub3 will be made publicly available. |
| Open Datasets | Yes | To enable our covisibility segmentation approach, we introduce Cub3, a large-scale dataset comprising two sub datasets each of 2.5 million image pairs with dense covisibility annotations derived from both the autonomous driving nu Scenes dataset [4], and the indoor Scan Net [8] dataset. Cub3 will be made publicly available. |
| Dataset Splits | Yes | For our experiments, we created two dataset variants, each containing 5M image pairs (2.5M from nu Scenes and 2.5M from Scan Net): Cub3-50: Image pairs with at least 50% overlap, similar to the criterion used in Cro Co [48]. Cub3-all: Image pairs with at least 5% overlap. |
| Hardware Specification | Yes | We implement Alligat0R using Py Torch and conduct all experiments on NVIDIA A100 GPUs. |
| Software Dependencies | No | We implement Alligat0R using Py Torch and conduct all experiments on NVIDIA A100 GPUs. The paper mentions PyTorch but does not specify its version or other software dependencies with version numbers. |
| Experiment Setup | Yes | We use a Vi T-based encoder and transformer decoder backbone similar to Cro Co, with 24 layers for the encoder and 12 for the decoder. For pre-training, we use the Adam W optimizer with a learning rate of 1.5e-4, weight decay of 0.05, and a batch size of 32 per GPU. We employ a cosine learning rate schedule with 2 epochs of warmup and train for 25 epochs on our Cub3 datasets. For fine-tuning on relative pose regression, we follow the two-phase approach described in Section 3.3 of the main paper: Phase 1: Freeze the backbone and train only the pose regression head for 5 epochs with a learning rate of 1e-4 Phase 2: Unfreeze the entire network and jointly train with both the pose regression loss and covisibility segmentation loss for an additional 10 epochs with a learning rate of 5e-5 |