StreamFlow: Streamlined Multi-Frame Optical Flow Estimation for Video Sequences
Authors: SHANGKUN SUN, Jiaming Liu, Huaxia Li, Guoqing Liu, Thomas Li, Wei Gao
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this study, we evaluate Stream Flow on the Sintel [3], KITTI [31], and Spring [30] datasets, following previous works [44, 9, 7]. From Table 1 and Table 4.1, we can learn that Stream Flow achieves advanced 0-shot performance on Sintel and KITTI. |
| Researcher Affiliation | Collaboration | Shangkun Sun SECE, Peking University Peng Cheng Laboratory sunshk@stu.pku.edu.cn; Jiaming Liu Tiamat AI james.liu.n1@gmail.com; Huaxia Li Xiaohongshu Inc. lihx0610@gmail.com; Guoqing Liu Minieye Inc. liugq@ntu.edu.sg; Thomas H Li SECE, Peking University thomas@pku.edu.cn; Wei Gao B SECE, Peking University Peng Cheng Laboratory gaowei262@pku.edu.cn |
| Pseudocode | Yes | Algorithm 1 Pairwise Multi-frame Estimation; Algorithm 2 Stream Flow Multi-frame Estimation |
| Open Source Code | Yes | The code is available here. (From NeurIPS checklist Q5) The code, pre-trained models, and visualization scripts can be seen in paperswithcode.com and github.com now. |
| Open Datasets | Yes | In this study, we evaluate Stream Flow on the Sintel [3], KITTI [31], and Spring [30] datasets, following previous works [44, 9, 7]. In previous works, models are initially pre-trained on the Flying Chairs [8] and Flying Things [29] datasets using the "C+T" schedule and then are subsequently fine-tuned using the "C+T+S+K+H" schedule on Sintel and KITTI datasets. In specific, for Sintel, models are trained on a combination of Flying Things, Sintel, KITTI, and HD1K [17]. |
| Dataset Splits | No | The paper describes training and testing on various datasets and fine-tuning. However, it does not explicitly specify the use of a 'validation set' or provide details on how a validation split was created or used for hyperparameter tuning or early stopping. It focuses on 'train' and 'test' phases. |
| Hardware Specification | Yes | Our Stream Flow method is built with Py Torch [34] library, and our experiments are conducted on the NVIDIA A100 GPUs. |
| Software Dependencies | Yes | Our Stream Flow method is built with Py Torch [34] library, and our experiments are conducted on the NVIDIA A100 GPUs. During training, we adopt the Adam W [24] optimizer and the one-cycle learning rate policy [40], following previous works [45, 15, 44]. With Py Torch 2.2 and flash-attention, using 12 refinements and 4 frames, the GPU memory usage for Stream Flow is shown in Table ??. |
| Experiment Setup | Yes | During training, we adopt the Adam W [24] optimizer and the one-cycle learning rate policy [40], following previous works [45, 15, 44]. The number of refinements in the decoder is set to 12, following previous works. In practice, N is set to 12, θ is set to 0.8, the same as previous works [38, 45, 44, 15] for a fair comparison. Given the absence of multi-frame data information in the Chairs dataset, we follow Video Flow [38] to directly train on the Flying Things in the first stage. For the Spring dataset, we follow the settings of Mem Flow [7] and fine-tune the model for 180k steps. The remaining training configurations are consistent with prior works [38, 44, 15, 45]. The temporal and non-temporal modeling modules are concurrently trained. |