Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals

Authors: Stefan Stojanov, David Wendt, Seungwoo Kim, Rahul Venkatesh, Kevin Feigelis, Klemen Kotar, Khai Loong Aw, Jiajun Wu, Daniel L Yamins

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Opt-CWM achieves state-of-the-art performance for motion estimation on real-world videos while requiring no labeled data. 1
Researcher Affiliation	Academia	Stefan Stojanov Stanford University David Wendt* Stanford University Seungwoo Kim* Stanford University Rahul Mysore Venkatesh* Stanford University Kevin Feigelis Stanford University Klemen Kotar Stanford University Khai Loong Aw Stanford University Jiajun Wu Stanford University Daniel L.K. Yamins Stanford University
Pseudocode	No	The paper describes the methodology in detail using prose and mathematical equations (e.g., Equation 1, Equation 2, Equation 3) and diagrams (e.g., Figure 1, Figure 2, Figure 3), but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	We include code and instructions in the supplement, and will make this code publicly available.
Open Datasets	Yes	Our main datasets for evaluation are TAP-Vid DAVIS and TAP-Vid Kinetics [10], the DAVIS [34] and Kinetics [26] datasets with human flow and occlusion annotations, along with the synthetic Kubric [18] dataset where ground-truth flows and occlusions are known.
Dataset Splits	No	The paper states that it trains on Kinetics-400 [26] and a custom BVD dataset, and evaluates on TAP-Vid DAVIS, TAP-Vid Kinetics [10], DAVIS [34], Kinetics [26], and Kubric [18] datasets, using specific evaluation protocols like 'TAP-Vid First' and 'TAP-Vid Constant five-Frame Gap (CFG)'. However, it does not explicitly provide percentages, sample counts, or specific predefined splits for the training/validation sets of the Kinetics-400 or BVD datasets.
Hardware Specification	Yes	It takes approximately 4 days to train 800 epochs on a TPU v5-128 pod. We pre-train CWM on the Kinetics-400 dataset [26], without requiring any specialized temporal downsampling.
Software Dependencies	No	The paper mentions the use of Adam W [30] optimizer and references architectures like Vi T-B [12] and Video MAE [41], but it does not specify explicit version numbers for any software libraries, programming languages, or frameworks used (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	Table 4: Default pre-training setting of CWM config value optimizer Adam W [30] base learning rate 1.5e-4 weight decay 0.05 optimizer momentum β1, β2 = 0.9, 0.95 [9] accumulative batch size 4096 learning rate schedule cosine decay [29] warmup epochs [16] 40 total epochs 800 flip augmentation no augmentation Multi Scale Crop [46]