Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals
Authors: Stefan Stojanov, David Wendt, Seungwoo Kim, Rahul Venkatesh, Kevin Feigelis, Klemen Kotar, Khai Loong Aw, Jiajun Wu, Daniel L Yamins
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Opt-CWM achieves state-of-the-art performance for motion estimation on real-world videos while requiring no labeled data. 1 |
| Researcher Affiliation | Academia | Stefan Stojanov Stanford University David Wendt* Stanford University Seungwoo Kim* Stanford University Rahul Mysore Venkatesh* Stanford University Kevin Feigelis Stanford University Klemen Kotar Stanford University Khai Loong Aw Stanford University Jiajun Wu Stanford University Daniel L.K. Yamins Stanford University |
| Pseudocode | No | The paper describes the methodology in detail using prose and mathematical equations (e.g., Equation 1, Equation 2, Equation 3) and diagrams (e.g., Figure 1, Figure 2, Figure 3), but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | We include code and instructions in the supplement, and will make this code publicly available. |
| Open Datasets | Yes | Our main datasets for evaluation are TAP-Vid DAVIS and TAP-Vid Kinetics [10], the DAVIS [34] and Kinetics [26] datasets with human flow and occlusion annotations, along with the synthetic Kubric [18] dataset where ground-truth flows and occlusions are known. |
| Dataset Splits | No | The paper states that it trains on Kinetics-400 [26] and a custom BVD dataset, and evaluates on TAP-Vid DAVIS, TAP-Vid Kinetics [10], DAVIS [34], Kinetics [26], and Kubric [18] datasets, using specific evaluation protocols like 'TAP-Vid First' and 'TAP-Vid Constant five-Frame Gap (CFG)'. However, it does not explicitly provide percentages, sample counts, or specific predefined splits for the training/validation sets of the Kinetics-400 or BVD datasets. |
| Hardware Specification | Yes | It takes approximately 4 days to train 800 epochs on a TPU v5-128 pod. We pre-train CWM on the Kinetics-400 dataset [26], without requiring any specialized temporal downsampling. |
| Software Dependencies | No | The paper mentions the use of Adam W [30] optimizer and references architectures like Vi T-B [12] and Video MAE [41], but it does not specify explicit version numbers for any software libraries, programming languages, or frameworks used (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Table 4: Default pre-training setting of CWM config value optimizer Adam W [30] base learning rate 1.5e-4 weight decay 0.05 optimizer momentum β1, β2 = 0.9, 0.95 [9] accumulative batch size 4096 learning rate schedule cosine decay [29] warmup epochs [16] 40 total epochs 800 flip augmentation no augmentation Multi Scale Crop [46] |