Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
FlowFeat: Pixel-Dense Embedding of Motion Profiles
Authors: Nikita Araslanov, Anna Sonnweber, Daniel Cremers
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, Flow Feat significantly enhances the representational power of five state-of-the-art encoders and alternative upsampling strategies across three dense tasks: video object segmentation, monocular depth estimation and semantic segmentation. |
| Researcher Affiliation | Academia | Nikita Araslanov 1 2 Anna Sonnweber 1 Daniel Cremers 1 2 1TU Munich 2MCML |
| Pseudocode | Yes | Figure 2: Embedding motion profiles: Flow Feat relies on the exponentially moving average (EMA) teacher model and learns to reconstruct apparent motion with a distribution of linear transformations. For a given frame It, we randomly sample its temporal counterpart It . A pre-trained network F computes optical flow F(t t ). We generate two overlapping random crops of frame It and feed the resulting views v1 and v2 to the teacher and the student networks, respectively. Obtaining the optimal linear transform A on-the-fly with ridge regression in the teacher branch, we compute the reconstruction loss w.r.t. the flow crop u2 to update the student parameters θ with gradient descent. (The figure contains a pseudocode block with 20 lines). |
| Open Source Code | Yes | Project website: https://tum-vision.github.io/flowfeat. Code and pre-trained models (Apache-2.0 License): https://github.com/tum-vision/flowfeat. |
| Open Datasets | Yes | The training data (You Tube-VOS and Kinetics-400) as well the evaluation datasets (e.g. DAVIS-2017) are publicly available. |
| Dataset Splits | Yes | We evaluate Flow Feat on semi-supervised video object segmentation (VOS) using 30 validation sequences from DAVIS-2017 (CC BY-SA 4.0, [40]). Adhering to the setting of Banani et al. [4], we train the probes on the NYUv2 s training set (24231 images) and evaluate the models on 480 480 centre crops of the 1449 validation images [36]. |
| Hardware Specification | Yes | To train one model, we use a single GPU with 46GB of memory. ... In wall-clock time with one A40 GPU, the training takes only 24 hours and 3 days for You Tube-VOS and Kinetics, respectively. |
| Software Dependencies | No | The paper mentions using Adam W optimiser and various models like SEA-RAFT, RAFT, and SMURF, but it does not specify software versions for programming languages or libraries such as Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | The training proceeds with minibatches of 128 images, input resolution 224 224 and Adam W optimiser [24, 31] with learning rate 10 4 and no weight decay. For the hyperparameters, we empirically set λ = 0.1, σ = 0.1 and γ = 1.0 ... We train Flow Feat for 500 epochs on You Tube-VOS and for 100 epochs on Kinetics. |