DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras
Authors: Zachary Teed, Jia Deng
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform extensive evaluation across four different datasets and three different sensor modalities, demonstrating state-of-the-art performance in all cases. We also include ablation studies that shed light on important design decisions and hyperparameters. 4 Experiments We experiment on a diverse set of datasets and sensor modalities. We compare to both deep learning and established classical SLAM algorithms and put speciļ¬c emphasis on cross-dataset generalization. Following prior work, we evaluate the accuracy of the camera trajectory [31, 15, 41], primarily using Absolute Trajectory Error (ATE) [43]. |
| Researcher Affiliation | Academia | Zachary Teed Jia Deng Princeton University {zteed,jiadeng}@princeton.edu |
| Pseudocode | No | No structured pseudocode or algorithm blocks found. |
| Open Source Code | Yes | The URL to our open source code is https://github.com/princeton-vl/DROID-SLAM. |
| Open Datasets | Yes | Our network is trained entirely on monocular video from the synthetic Tartan Air dataset [54].W. Wang, D. Zhu, X. Wang, Y. Hu, Y. Qiu, C. Wang, Y. Hu, A. Kapoor, and S. Scherer. Tartanair: A dataset to push the limits of visual slam. ar Xiv preprint ar Xiv:2003.14338, 2020. |
| Dataset Splits | No | The paper mentions 'test split' and 'training set' but does not explicitly provide details for a validation split or percentages for train/validation/test splits in the main text. |
| Hardware Specification | Yes | Training takes 1 week on 4 RTX-3090 GPUs.Our system can run in real-time with 2 3090 GPUs.All results on TUM-RGBD can be produced on a single 1080Ti graphics card. |
| Software Dependencies | No | The paper mentions "Py Torch" and "Lie Torch" but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We train our network for 250k steps with a batch size of 4, resolution 384 512, and 7 frame clips, and unroll 15 update iterations. |