SQLdepth: Generalizable Self-Supervised Fine-Structured Monocular Depth Estimation
Authors: Youhong Wang, Yunji Liang, Hao Xu, Shaohui Jiao, Hongkai Yu
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on KITTI and Cityscapes show that our method attains remarkable state-of-the-art performance |
| Researcher Affiliation | Collaboration | Youhong Wang1, 2, Yunji Liang1*, Hao Xu2, Shaohui Jiao2, Hongkai Yu3 1Northwestern Polytechnical University 2Bytedance Inc 3Cleveland State University |
| Pseudocode | No | The paper does not contain any sections or figures explicitly labeled as 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Code is available at https://github.com/hisfog/Sf MNe Xt-Impl. |
| Open Datasets | Yes | KITTI (Geiger et al. 2013) is a dataset that provides stereo image sequences, which is commonly used for selfsupervised monocular depth estimation. Cityscapes (Cordts et al. 2016) is a challenging dataset which contains numerous moving objects. Make3D (Saxena, Sun, and Ng 2008) To evaluate the generalization ability of SQLdepth, we use the KITTI-pretrained SQLdepth to perform zero-shot evaluation on the Make3D dataset, and provide additional depth map visualizations. |
| Dataset Splits | No | The paper mentions using the 'Eigen test split' for KITTI and refers to standard datasets, but does not provide explicit details on training, validation, and test dataset percentages or sample counts to reproduce the splits. It does not mention a specific validation set split. |
| Hardware Specification | Yes | The model is trained on 3 NVIDIA V100 GPUs, with a batch size of 16. |
| Software Dependencies | No | The paper states 'Our method is implemented using Pytorch framework (Paszke et al. 2019)' but does not provide specific version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | The model is trained on 3 NVIDIA V100 GPUs, with a batch size of 16. Following the settings from (Godard et al. 2019), we use color and flip augmentations on images during training. We jointly train both Depth Net and Pose Net with the Adam Optimizer (Kingma and Ba 2014) with β1 = 0.9, β2 = 0.999. The initial learning rate is set to 1e 4 and decays to 1e 5 after 15 epochs. We set the SSIM weight to α = 0.85 and smooth loss term weight to λ = 1e 3. We use the Res Net-50 (He et al. 2016) with Image Net (Russakovsky et al. 2015) pretrained weights as backbone, as the other baselines do. |