DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras

Authors: Zachary Teed, Jia Deng

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform extensive evaluation across four different datasets and three different sensor modalities, demonstrating state-of-the-art performance in all cases. We also include ablation studies that shed light on important design decisions and hyperparameters. 4 Experiments We experiment on a diverse set of datasets and sensor modalities. We compare to both deep learning and established classical SLAM algorithms and put specific emphasis on cross-dataset generalization. Following prior work, we evaluate the accuracy of the camera trajectory [31, 15, 41], primarily using Absolute Trajectory Error (ATE) [43].
Researcher Affiliation Academia Zachary Teed Jia Deng Princeton University {zteed,jiadeng}@princeton.edu
Pseudocode No No structured pseudocode or algorithm blocks found.
Open Source Code Yes The URL to our open source code is https://github.com/princeton-vl/DROID-SLAM.
Open Datasets Yes Our network is trained entirely on monocular video from the synthetic Tartan Air dataset [54].W. Wang, D. Zhu, X. Wang, Y. Hu, Y. Qiu, C. Wang, Y. Hu, A. Kapoor, and S. Scherer. Tartanair: A dataset to push the limits of visual slam. ar Xiv preprint ar Xiv:2003.14338, 2020.
Dataset Splits No The paper mentions 'test split' and 'training set' but does not explicitly provide details for a validation split or percentages for train/validation/test splits in the main text.
Hardware Specification Yes Training takes 1 week on 4 RTX-3090 GPUs.Our system can run in real-time with 2 3090 GPUs.All results on TUM-RGBD can be produced on a single 1080Ti graphics card.
Software Dependencies No The paper mentions "Py Torch" and "Lie Torch" but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes We train our network for 250k steps with a batch size of 4, resolution 384 512, and 7 frame clips, and unroll 15 update iterations.