SAMFlow: Eliminating Any Fragmentation in Optical Flow with Segment Anything Model

Authors: Shili Zhou, Ruian He, Weimin Tan, Bo Yan

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our proposed SAMFlow model reaches 0.86/2.10 clean/final EPE and 3.55/12.32 EPE/F1all on Sintel and KITTI-15 training set, surpassing Flowformer by 8.5%/9.9% and 13.2%/16.3%. Furthermore, our model achieves state-of-the-art performance on the Sintel and KITTI-15 benchmarks, ranking #1 among all two-frame methods on Sintel clean pass.
Researcher Affiliation Academia School of Computer Science, Shanghai Key Laboratory of Intelligent Information Processing, Fudan University slzhou19@fudan.edu.cn, rahe16@fudan.edu.cn, wmtan@fudan.edu.cn, byan@fudan.edu.cn
Pseudocode No Overall, this module can be represented by Formula 4, 5, 6 and 7.
Open Source Code No No explicit statement or link for open-source code was found in the paper.
Open Datasets Yes Our proposed SAMFlow model reaches 0.86/2.10 clean/final EPE and 3.55/12.32 EPE/F1all on Sintel and KITTI-15 training set, surpassing Flowformer by 8.5%/9.9% and 13.2%/16.3%. With the above designs, our proposed SAMFlow achieves remarkable performance, reaching 0.86/2.10 clean/final EPE on Sintel (Butler et al. 2012) training set and 3.55/12.32 EPE/F1-all on KITTI-15 (Geiger et al. 2013) training set.
Dataset Splits No Training Settings We follow the setup of previous work (Huang et al. 2022a) and divide the training into two stages: C+T-Stage and C+T+S+K+H-stage. To speed up training, we skip the stage of training on the Chairs dataset by using Flow Former-things checkpoint as initialization, and the SAM encoder is kept frozen during training.
Hardware Specification No Figure 7: Runtime and accuracy comparison between Flowformer, Flow Former++, and our models with different SAM encoders, including SAM-B, SAM-H, and Mobile SAM (MSAM). The x-axis is the average time of 100 runs of 384 × 1024 inputs, and the y-axis is the f1 score on KITTI.
Software Dependencies No No specific software dependencies with version numbers were mentioned in the paper.
Experiment Setup No Training Settings We follow the setup of previous work (Huang et al. 2022a) and divide the training into two stages: C+T-Stage and C+T+S+K+H-stage. To speed up training, we skip the stage of training on the Chairs dataset by using Flow Former-things checkpoint as initialization, and the SAM encoder is kept frozen during training. Test Settings For testing, we adopt the tiling strategy (Jaegle et al. 2021) to bridge the resolution gap between training and testing data.