StreamFlow: Streamlined Multi-Frame Optical Flow Estimation for Video Sequences

Authors: SHANGKUN SUN, Jiaming Liu, Huaxia Li, Guoqing Liu, Thomas Li, Wei Gao

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this study, we evaluate Stream Flow on the Sintel [3], KITTI [31], and Spring [30] datasets, following previous works [44, 9, 7]. From Table 1 and Table 4.1, we can learn that Stream Flow achieves advanced 0-shot performance on Sintel and KITTI.
Researcher Affiliation Collaboration Shangkun Sun SECE, Peking University Peng Cheng Laboratory sunshk@stu.pku.edu.cn; Jiaming Liu Tiamat AI james.liu.n1@gmail.com; Huaxia Li Xiaohongshu Inc. lihx0610@gmail.com; Guoqing Liu Minieye Inc. liugq@ntu.edu.sg; Thomas H Li SECE, Peking University thomas@pku.edu.cn; Wei Gao B SECE, Peking University Peng Cheng Laboratory gaowei262@pku.edu.cn
Pseudocode Yes Algorithm 1 Pairwise Multi-frame Estimation; Algorithm 2 Stream Flow Multi-frame Estimation
Open Source Code Yes The code is available here. (From NeurIPS checklist Q5) The code, pre-trained models, and visualization scripts can be seen in paperswithcode.com and github.com now.
Open Datasets Yes In this study, we evaluate Stream Flow on the Sintel [3], KITTI [31], and Spring [30] datasets, following previous works [44, 9, 7]. In previous works, models are initially pre-trained on the Flying Chairs [8] and Flying Things [29] datasets using the "C+T" schedule and then are subsequently fine-tuned using the "C+T+S+K+H" schedule on Sintel and KITTI datasets. In specific, for Sintel, models are trained on a combination of Flying Things, Sintel, KITTI, and HD1K [17].
Dataset Splits No The paper describes training and testing on various datasets and fine-tuning. However, it does not explicitly specify the use of a 'validation set' or provide details on how a validation split was created or used for hyperparameter tuning or early stopping. It focuses on 'train' and 'test' phases.
Hardware Specification Yes Our Stream Flow method is built with Py Torch [34] library, and our experiments are conducted on the NVIDIA A100 GPUs.
Software Dependencies Yes Our Stream Flow method is built with Py Torch [34] library, and our experiments are conducted on the NVIDIA A100 GPUs. During training, we adopt the Adam W [24] optimizer and the one-cycle learning rate policy [40], following previous works [45, 15, 44]. With Py Torch 2.2 and flash-attention, using 12 refinements and 4 frames, the GPU memory usage for Stream Flow is shown in Table ??.
Experiment Setup Yes During training, we adopt the Adam W [24] optimizer and the one-cycle learning rate policy [40], following previous works [45, 15, 44]. The number of refinements in the decoder is set to 12, following previous works. In practice, N is set to 12, θ is set to 0.8, the same as previous works [38, 45, 44, 15] for a fair comparison. Given the absence of multi-frame data information in the Chairs dataset, we follow Video Flow [38] to directly train on the Flying Things in the first stage. For the Spring dataset, we follow the settings of Mem Flow [7] and fine-tune the model for 180k steps. The remaining training configurations are consistent with prior works [38, 44, 15, 45]. The temporal and non-temporal modeling modules are concurrently trained.