SCTN: Sparse Convolution-Transformer Network for Scene Flow Estimation
Authors: Bing Li, Cheng Zheng, Silvio Giancola, Bernard Ghanem1254-1262
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our proposed approach achieves a new state of the art in scene flow estimation. Our approach achieves an error of 0.038 and 0.037 (EPE3D) on Flying Things3D and KITTI Scene Flow respectively, which significantly outperforms previous methods by large margins. |
| Researcher Affiliation | Academia | King Abdullah University of Science and Technology {bing.li, cheng.zheng, silvio.giancola, Bernard.Ghanem}@kaust.edu.sa |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | We conduct our experiments on two datasets that are widely used to evaluate scene flow. Flying Things3D (N.Mayer et al. 2016a) is a large-scale synthetic stereo video datasets... KITTI Scene Flow (Menze, Heipke, and Geiger 2018; Choy, Gwak, and Savarese 2019) is a real-world Lidar scan dataset... |
| Dataset Splits | Yes | We use the same 19640/3824 pairs of point cloud (training/testing) used in the related works (Puy, Boulch, and Marlet 2020; Gu et al. 2019; Wu et al. 2020). |
| Hardware Specification | Yes | The runtime of FLOT and our SCTN are evaluated on a single GTX2080Ti GPU. |
| Software Dependencies | No | We implement our method in Py Torch (Paszke et al. 2019). While PyTorch is mentioned, a specific version number is not provided, nor are other software dependencies with their versions. |
| Experiment Setup | Yes | We minimize a cumulative loss E = Es + λEc with λ = 0.30 a weight that scale the losses. We use the Adam optimizer (Kingma and Ba 2014) with an initial learning rate of 10-3, which is dropped to 10-4 after the 50th epoch. First, we train for 40 epochs only using the supervised loss. Then we continue the training for 20 epochs with both the supervision loss and the FSC loss, for a total on 60 epochs. We use a voxel size of resolution 0.07m. |