Consensus Learning with Deep Sets for Essential Matrix Estimation
Authors: Dror Moran, Yuval Margalit, Guy Trostianetsky, Fadi Khatib, Meirav Galun, Ronen Basri
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We trained and tested our method on both indoor and outdoor datasets. For an outdoor dataset, we used Yahoo s YFCC dataset[49], which contains 100 million images from flicker later reconstructed using SFM[18]. For an indoor dataset, we used the SUN3D [53]. ... Our results are shown in Table 1-3. ... We also tested our model using the deep-learning based descriptor Super Point on both the YFCC and SUN3D datasets. ... In ablation studies we tested the importance of our noise head, i.e., keypoint denoising process, by training our model with and without this head. |
| Researcher Affiliation | Academia | Department of Computer Science and Applied Mathematics Weizmann Institute of Science |
| Pseudocode | No | The paper includes a network architecture diagram (Figure 1) but does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/drormoran/NACNet. |
| Open Datasets | Yes | For an outdoor dataset, we used Yahoo s YFCC dataset[49], which contains 100 million images from flicker later reconstructed using SFM[18]. For an indoor dataset, we used the SUN3D [53]. Additionally, we used the Phototourism dataset[19] to test our model s generalization across datasets. |
| Dataset Splits | No | For both datasets, we used the same preprocessing and dataset split as in [56], i.e., the camera poses are extracted from an SFM pipeline, and the test set is split into in-scene and cross-scene generalization. This specifies categories of splits for the test set, but not explicit training, validation, and test percentages or sample counts. |
| Hardware Specification | Yes | Training was run on an NVIDIA Quadro RTX 6000/ DGX V100/ A40 GPUs, with a maximum memory usage of 5GB. ... For GPU resource usage, we used an NVIDIA Ge Force RTX 2080Ti and an Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz for CPU. |
| Software Dependencies | No | For the loss function we set βinliers = 1, βoutliers = 10, αmod = 1 and αns = 100. We used a threshold of 3 10 3 for labeling inliers and outliers. In training, we used the ADAM[23] optimizer with a batch size of 32 image pairs and a learning rate of 10 4. The paper mentions the ADAM optimizer and cites OpenCV[6] and Kornia[41] in the references, but does not provide specific version numbers for these or other software dependencies necessary for replication. |
| Experiment Setup | Yes | For the loss function we set βinliers = 1, βoutliers = 10, αmod = 1 and αns = 100. We used a threshold of 3 10 3 for labeling inliers and outliers. In training, we used the ADAM[23] optimizer with a batch size of 32 image pairs and a learning rate of 10 4. The network consists of three consecutive NAC blocks, where we only use the output of the last block at inference time. The Set Encoders in the NAC blocks combine 12 set layers interleaved with Soft Plus activation, layer normalization, and skip connections in a Res Net-like architecture. We set the dimension of the Set Encoder to 512. The Classification and Noise Heads consist of two-layer MLPs interleaved with an activation function. We used a Soft Plus activation for the classification head and a Leaky Re LU for the noise head. |