SAMFlow: Eliminating Any Fragmentation in Optical Flow with Segment Anything Model
Authors: Shili Zhou, Ruian He, Weimin Tan, Bo Yan
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our proposed SAMFlow model reaches 0.86/2.10 clean/final EPE and 3.55/12.32 EPE/F1all on Sintel and KITTI-15 training set, surpassing Flowformer by 8.5%/9.9% and 13.2%/16.3%. Furthermore, our model achieves state-of-the-art performance on the Sintel and KITTI-15 benchmarks, ranking #1 among all two-frame methods on Sintel clean pass. |
| Researcher Affiliation | Academia | School of Computer Science, Shanghai Key Laboratory of Intelligent Information Processing, Fudan University slzhou19@fudan.edu.cn, rahe16@fudan.edu.cn, wmtan@fudan.edu.cn, byan@fudan.edu.cn |
| Pseudocode | No | Overall, this module can be represented by Formula 4, 5, 6 and 7. |
| Open Source Code | No | No explicit statement or link for open-source code was found in the paper. |
| Open Datasets | Yes | Our proposed SAMFlow model reaches 0.86/2.10 clean/final EPE and 3.55/12.32 EPE/F1all on Sintel and KITTI-15 training set, surpassing Flowformer by 8.5%/9.9% and 13.2%/16.3%. With the above designs, our proposed SAMFlow achieves remarkable performance, reaching 0.86/2.10 clean/final EPE on Sintel (Butler et al. 2012) training set and 3.55/12.32 EPE/F1-all on KITTI-15 (Geiger et al. 2013) training set. |
| Dataset Splits | No | Training Settings We follow the setup of previous work (Huang et al. 2022a) and divide the training into two stages: C+T-Stage and C+T+S+K+H-stage. To speed up training, we skip the stage of training on the Chairs dataset by using Flow Former-things checkpoint as initialization, and the SAM encoder is kept frozen during training. |
| Hardware Specification | No | Figure 7: Runtime and accuracy comparison between Flowformer, Flow Former++, and our models with different SAM encoders, including SAM-B, SAM-H, and Mobile SAM (MSAM). The x-axis is the average time of 100 runs of 384 × 1024 inputs, and the y-axis is the f1 score on KITTI. |
| Software Dependencies | No | No specific software dependencies with version numbers were mentioned in the paper. |
| Experiment Setup | No | Training Settings We follow the setup of previous work (Huang et al. 2022a) and divide the training into two stages: C+T-Stage and C+T+S+K+H-stage. To speed up training, we skip the stage of training on the Chairs dataset by using Flow Former-things checkpoint as initialization, and the SAM encoder is kept frozen during training. Test Settings For testing, we adopt the tiling strategy (Jaegle et al. 2021) to bridge the resolution gap between training and testing data. |