reproducibilityindex.ai

SCTN: Sparse Convolution-Transformer Network for Scene Flow Estimation

Authors: Bing Li, Cheng Zheng, Silvio Giancola, Bernard Ghanem1254-1262

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that our proposed approach achieves a new state of the art in scene flow estimation. Our approach achieves an error of 0.038 and 0.037 (EPE3D) on Flying Things3D and KITTI Scene Flow respectively, which significantly outperforms previous methods by large margins.
Researcher Affiliation	Academia	King Abdullah University of Science and Technology {bing.li, cheng.zheng, silvio.giancola, Bernard.Ghanem}@kaust.edu.sa
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper does not provide any statement about releasing source code or a link to a code repository.
Open Datasets	Yes	We conduct our experiments on two datasets that are widely used to evaluate scene flow. Flying Things3D (N.Mayer et al. 2016a) is a large-scale synthetic stereo video datasets... KITTI Scene Flow (Menze, Heipke, and Geiger 2018; Choy, Gwak, and Savarese 2019) is a real-world Lidar scan dataset...
Dataset Splits	Yes	We use the same 19640/3824 pairs of point cloud (training/testing) used in the related works (Puy, Boulch, and Marlet 2020; Gu et al. 2019; Wu et al. 2020).
Hardware Specification	Yes	The runtime of FLOT and our SCTN are evaluated on a single GTX2080Ti GPU.
Software Dependencies	No	We implement our method in Py Torch (Paszke et al. 2019). While PyTorch is mentioned, a specific version number is not provided, nor are other software dependencies with their versions.
Experiment Setup	Yes	We minimize a cumulative loss E = Es + λEc with λ = 0.30 a weight that scale the losses. We use the Adam optimizer (Kingma and Ba 2014) with an initial learning rate of 10-3, which is dropped to 10-4 after the 50th epoch. First, we train for 40 epochs only using the supervised loss. Then we continue the training for 20 epochs with both the supervision loss and the FSC loss, for a total on 60 epochs. We use a voxel size of resolution 0.07m.